python - How to avoid pickling Celery task? -
my scenario follows: have large machine learning model, computed bunch of workers. in essence workers compute own part of model , exchange results in order maintain globally consistent state of model.
so, every celery task computes it's own part of job. means, tasks aren't stateless, , here trouble : if some_task.delay( 123, 456 )
, in reality i'm not sending 2 integers here!
i'm sending whole state of task, pickled somewhere in celery. state typically 200 mb :-((
i know, it's possible select decent serializer in celery, question how not pickle data, in task. how pickle arguments of task only? here citation celery/app/task.py:
def __reduce__(self): # - tasks pickled name of task only, , reciever # - grabs local registry. # - in later versions module of task included, # - , receiving side tries import module # - work if task has not been registered. mod = type(self).__module__ mod = mod if mod , mod in sys.modules else none return (_unpickle_task_v2, (self.name, mod), none)
i don't want happen. there simple way around it, or i'm forced build own celery ( ugly imagine)?
don't use celery results backend this. use separate data store.
while use task.ignore_result
mean loose ability track tasks status etc.
the best solution use 1 storage engine (e.g. redis) results backend. should set separate storage engine (a separate instance of redis, or maybe mongodb, depending on needs) store actual data.
in way can still see status of tasks large data sets not affect operation of celery.
switching json
serializer may reduce serialization overhead, depending on format of data generate . can't solve underlying problem of putting data through results backend.
the results backend can handle relatively small amounts of data - once go on limit start prevent proper operation of primary tasks - communication of task status.
i suggest updating tasks return lightweight data structure containing useful metadata (to e.g. facilitate co-ordination between tasks), , storing "real" data in dedicated storage solution.
Comments
Post a Comment