spark 1.2.0 mllib kmeans: Out Of Memory Error -


i'am new spark, , use kmeans algorithm cluster data set, size 484m, 213104 dimensions, , code follow:

val k = args(0).toint val maxiter = args(1).toint val model = new kmeans().setk(k).setmaxiterations(maxiter).setepsilon(1e-1).run(trainingdata) val modelrdd = sc.makerdd(model.clustercenters) val savemodelpath = "/home/work/kmeansmodel_" + args(0) if(files.exists(paths.get(savemodelpath))) {   fileutils.deletedirectory(new file(savemodelpath)) } modelrdd.saveastextfile(savemodelpath) val loss = model.computecost(trainingdata) println("within set sum of squared errors = " + loss) 

when set k = 150, works, when set k = 300, throws java.lang.outofmemoryerror: java heap space exception. configuration:

--executor-memory 30g --driver-memory 4g --conf spark.shuffle.spill=false --conf spark.storage.memoryfraction=0.1 

you should tell more environment. running in real cluster, or in local mode?

since said new spark, assume playing around on local machine. in case, think post can you.

update

your error not oom, heap space exception. did cache rdd?


Comments

Popular posts from this blog

python - Mongodb How to add addtional information when aggregating? -

java - Spring Data JPA: Why findOne(id) executing delete query internally? -

java - Incorrect order of records in M-M relationship in hibernate -