spark 1.2.0 mllib kmeans: Out Of Memory Error -

April 15, 2015

i'am new spark, , use kmeans algorithm cluster data set, size 484m, 213104 dimensions, , code follow:

val k = args(0).toint val maxiter = args(1).toint val model = new kmeans().setk(k).setmaxiterations(maxiter).setepsilon(1e-1).run(trainingdata) val modelrdd = sc.makerdd(model.clustercenters) val savemodelpath = "/home/work/kmeansmodel_" + args(0) if(files.exists(paths.get(savemodelpath))) {   fileutils.deletedirectory(new file(savemodelpath)) } modelrdd.saveastextfile(savemodelpath) val loss = model.computecost(trainingdata) println("within set sum of squared errors = " + loss)

when set k = 150, works, when set k = 300, throws java.lang.outofmemoryerror: java heap space exception. configuration:

--executor-memory 30g --driver-memory 4g --conf spark.shuffle.spill=false --conf spark.storage.memoryfraction=0.1

you should tell more environment. running in real cluster, or in local mode?

since said new spark, assume playing around on local machine. in case, think post can you.

update

your error not oom, heap space exception. did cache rdd?

Search This Blog

Ruby Code

spark 1.2.0 mllib kmeans: Out Of Memory Error -

update

Comments

Post a Comment

Popular posts from this blog

java - Spring Data JPA: Why findOne(id) executing delete query internally? -

python - Mongodb How to add addtional information when aggregating? -

java - Incorrect order of records in M-M relationship in hibernate -