scala - java.sql.SQLException: No suitable driver found when loading DataFrame into Spark SQL -
i'm hitting strange problem when trying load jdbc dataframe spark sql.
i've tried several spark clusters - yarn, standalone cluster , pseudo distributed mode on laptop. it's reproducible on both spark 1.3.0 , 1.3.1. problem occurs in both spark-shell , when executing code spark-submit. i've tried mysql & ms sql jdbc drivers without success.
consider following sample:
val driver = "com.mysql.jdbc.driver" val url = "jdbc:mysql://localhost:3306/test" val t1 = { sqlcontext.load("jdbc", map( "url" -> url, "driver" -> driver, "dbtable" -> "t1", "partitioncolumn" -> "id", "lowerbound" -> "0", "upperbound" -> "100", "numpartitions" -> "50" )) } so far good, schema gets resolved properly:
t1: org.apache.spark.sql.dataframe = [id: int, name: string] but when evaluate dataframe:
t1.take(1) following exception occurs:
15/04/29 01:56:44 warn tasksetmanager: lost task 0.0 in stage 0.0 (tid 0, 192.168.1.42): java.sql.sqlexception: no suitable driver found jdbc:mysql://<hostname>:3306/test @ java.sql.drivermanager.getconnection(drivermanager.java:689) @ java.sql.drivermanager.getconnection(drivermanager.java:270) @ org.apache.spark.sql.jdbc.jdbcrdd$$anonfun$getconnector$1.apply(jdbcrdd.scala:158) @ org.apache.spark.sql.jdbc.jdbcrdd$$anonfun$getconnector$1.apply(jdbcrdd.scala:150) @ org.apache.spark.sql.jdbc.jdbcrdd$$anon$1.<init>(jdbcrdd.scala:317) @ org.apache.spark.sql.jdbc.jdbcrdd.compute(jdbcrdd.scala:309) @ org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:277) @ org.apache.spark.rdd.rdd.iterator(rdd.scala:244) @ org.apache.spark.rdd.mappartitionsrdd.compute(mappartitionsrdd.scala:35) @ org.apache.spark.rdd.rdd.computeorreadcheckpoint(rdd.scala:277) @ org.apache.spark.rdd.rdd.iterator(rdd.scala:244) @ org.apache.spark.scheduler.resulttask.runtask(resulttask.scala:61) @ org.apache.spark.scheduler.task.run(task.scala:64) @ org.apache.spark.executor.executor$taskrunner.run(executor.scala:203) @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1142) @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:617) @ java.lang.thread.run(thread.java:745) when try open jdbc connection on executor:
import java.sql.drivermanager sc.parallelize(0 until 2, 2).map { => class.forname(driver) val conn = drivermanager.getconnection(url) conn.close() }.collect() it works perfectly:
res1: array[int] = array(0, 1) when run same code on local spark, works too:
scala> t1.take(1) ... res0: array[org.apache.spark.sql.row] = array([1,one]) i'm using spark pre-built hadoop 2.4 support.
the easiest way reproduce problem start spark in pseudo distributed mode start-all.sh script , run following command:
/path/to/spark-shell --master spark://<hostname>:7077 --jars /path/to/mysql-connector-java-5.1.35.jar --driver-class-path /path/to/mysql-connector-java-5.1.35.jar is there way work around? looks severe problem, it's strange googling doesn't here.
apparently issue has been reported:
https://issues.apache.org/jira/browse/spark-6913
the problem in java.sql.drivermanager doesn't see drivers loaded classloaders other bootstrap classloader.
as temporary workaround it's possible add required drivers boot classpath of executors.
update: pull request fixes problem: https://github.com/apache/spark/pull/5782
update 2: fix merged spark 1.4
Comments
Post a Comment