amazon s3 - Spark Storage Best Practice -
i planning deploy spark cluster. spark supports many storage schema such hdfs, s3, hbase, cassandra, hive, etc.
since not migrating hadoop spark, have no existing 'big data' storage , still trying figure out 1 best choice.
what best way store data optimize spark fullest? use case tracking user behavior data , use spark etl create data warehouse , other data products.
one thing came mind having hdfs storage in each of worker node, hadoop storage schema is.
Comments
Post a Comment