scala - Spark Fold vs Reduce in performance? -
in big data processing job, function "fold" have lower computation performance compared function "reduce" ?
for instance, have following 2 functions:
array1.indices.zip(array1).map(x => x._1 * x._2).reduce(_ + _) array1.indices.zip(array1).map(x => x._1 * x._2).fold(0.0) {_ + _} array1 huge rdd array. function has higher computation performance giving same clustering setting.
this indeed same 1 pointed out muhuk guts of spark implementation merely call iterator
fold source:
(iter: iterator[t]) => iter.fold(zerovalue)(cleanop) reduce source:
iter => if (iter.hasnext)some(iter.reduceleft(cleanf)) else none so, calling scala implementations.
Comments
Post a Comment