How to merge text files using mapping and reducing in Java Spark MLLib? -


i have large dataset stored on hadoop (yarn cluster) on want train support vector machine classifier. features extracted each data-point dataset , saved in libsvm format. spark mllib can read these files using mlutils.loadlibsvmfile(javasparkcontext context, string directory). every file has 1 line doubles ending in newline character. line represents values of features.

i want concatenate these files javardd. can use .textfile("../*") somekind of .join or .union statement? not understand how ...

could please kind help? think more people know how efficiently.

sparkcontext.textfile("/path/to/file/*") read all matched files , represent single large rdd.

and think mlutils.loadlibsvmfile(sc, "/path/to/file/*") load features you. have tried?


Comments

Popular posts from this blog

php - failed to open stream: HTTP request failed! HTTP/1.0 400 Bad Request -

java - How to filter a backspace keyboard input -

java - Show Soft Keyboard when EditText Appears -