How to merge text files using mapping and reducing in Java Spark MLLib? -
i have large dataset stored on hadoop (yarn cluster) on want train support vector machine classifier. features extracted each data-point dataset , saved in libsvm format. spark mllib can read these files using mlutils.loadlibsvmfile(javasparkcontext context, string directory). every file has 1 line doubles ending in newline character. line represents values of features.
i want concatenate these files javardd. can use .textfile("../*") somekind of .join or .union statement? not understand how ...
could please kind help? think more people know how efficiently.
sparkcontext.textfile("/path/to/file/*")
read all matched files , represent single large rdd.
and think mlutils.loadlibsvmfile(sc, "/path/to/file/*")
load features you. have tried?
Comments
Post a Comment