amazon s3 - Spark Storage Best Practice -


i planning deploy spark cluster. spark supports many storage schema such hdfs, s3, hbase, cassandra, hive, etc.

since not migrating hadoop spark, have no existing 'big data' storage , still trying figure out 1 best choice.

what best way store data optimize spark fullest? use case tracking user behavior data , use spark etl create data warehouse , other data products.

one thing came mind having hdfs storage in each of worker node, hadoop storage schema is.


Comments

Popular posts from this blog

php - failed to open stream: HTTP request failed! HTTP/1.0 400 Bad Request -

java - How to filter a backspace keyboard input -

java - Show Soft Keyboard when EditText Appears -