pyspark - Spark streaming processes RDDs one by one? -
i wrote spark streaming program pyspark.
it receives live input text stream sockettextstream , transformations , saves csv file saveastextfile. spark streaming window operation not used , no previous data required create output data.
but seems spark not start process rdd in dstream until previous rdd finishes when previous rdd uses few partitions , cpu/memory.
is spark's default behaviour ? there way change such behaviour ?
can kindly post code , problem facing?
conceptually, data within each time interval forms rdd @ end of interval (thats idea of forming mini-batch data abstraction).
Comments
Post a Comment