spark 1.2.0 mllib kmeans: Out Of Memory Error -
i'am new spark, , use kmeans algorithm cluster data set, size 484m, 213104 dimensions, , code follow:
val k = args(0).toint val maxiter = args(1).toint val model = new kmeans().setk(k).setmaxiterations(maxiter).setepsilon(1e-1).run(trainingdata) val modelrdd = sc.makerdd(model.clustercenters) val savemodelpath = "/home/work/kmeansmodel_" + args(0) if(files.exists(paths.get(savemodelpath))) { fileutils.deletedirectory(new file(savemodelpath)) } modelrdd.saveastextfile(savemodelpath) val loss = model.computecost(trainingdata) println("within set sum of squared errors = " + loss) when set k = 150, works, when set k = 300, throws java.lang.outofmemoryerror: java heap space exception. configuration:
--executor-memory 30g --driver-memory 4g --conf spark.shuffle.spill=false --conf spark.storage.memoryfraction=0.1
you should tell more environment. running in real cluster, or in local mode?
since said new spark, assume playing around on local machine. in case, think post can you.
update
your error not oom, heap space exception. did cache rdd?
Comments
Post a Comment