amazon s3 - Spark Storage Best Practice -

amazon s3 - Spark Storage Best Practice -

i planning deploy spark cluster. spark supports many storage schema such hdfs, s3, hbase, cassandra, hive, etc.

since not migrating hadoop spark, have no existing 'big data' storage , still trying figure out 1 best choice.

what best way store data optimize spark fullest? use case tracking user behavior data , use spark etl create data warehouse , other data products.

one thing came mind having hdfs storage in each of worker node, hadoop storage schema is.

Comments