pyspark - Spark streaming processes RDDs one by one? -
i wrote spark streaming
program pyspark
.
it receives live input text stream sockettextstream
, transformations , saves csv
file saveastextfile
. spark streaming
window operation not used , no previous data required create output data.
but seems spark
not start process rdd
in dstream
until previous rdd
finishes when previous rdd
uses few partitions , cpu/memory.
is spark
's default behaviour ? there way change such behaviour ?
can kindly post code , problem facing?
conceptually, data within each time interval forms rdd @ end of interval (thats idea of forming mini-batch data abstraction).
Comments
Post a Comment