pyspark - Spark streaming processes RDDs one by one? -

January 15, 2011

i wrote spark streaming program pyspark.

it receives live input text stream sockettextstream , transformations , saves csv file saveastextfile. spark streaming window operation not used , no previous data required create output data.

but seems spark not start process rdd in dstream until previous rdd finishes when previous rdd uses few partitions , cpu/memory.

is spark's default behaviour ? there way change such behaviour ?

can kindly post code , problem facing?

conceptually, data within each time interval forms rdd @ end of interval (thats idea of forming mini-batch data abstraction).

Search This Blog

Ruby Code

pyspark - Spark streaming processes RDDs one by one? -

Comments

Post a Comment

Popular posts from this blog

php - failed to open stream: HTTP request failed! HTTP/1.0 400 Bad Request -

java - Show Soft Keyboard when EditText Appears -

command line - Use qwinsta in PowerShell ISE -