Limit to the number of results from a Dataflow input file pattern glob? -
update:
we've seen these 400-class errors re:
com.google.api.client.googleapis.json.googlejsonresponseexception: 400 bad request { "code" : 400, "errors" : [ { "domain" : "global", "message" : "request payload exceeds allowable limit: 50000.", "reason" : "badrequest" } ], "message" : "request payload exceeds allowable limit: 50000.", "status" : "invalid_argument" } @ com.google.api.client.googleapis.json.googlejsonresponseexception.from(googlejsonresponseexception.java:145) at
on glob resolves to:
total: 60 objects, 8405391 bytes (8.02 mib)
and have been experiencing increased variability of input globs hitting limit on past several days.
--
recently we've had observations of job failure when filepattern specs derive large numbers of files passed input dataflow jobs. examples of messages produced in these scenarios are:
apr 29, 2015, 9:22:51 (5dd3e79031bdcc45): com.google.api.client.googleapis.json.googlejsonresponseexception: 400 bad request { "code" : 400, "errors" : [ { "domain" : "global", "message" : "request payload exceeds allowable limit: 50000.", "reason" : "badrequest" } ], "message" : "request payload exceeds allowable limit: 50000.", "status" : "invalid_argument" } @ com.google.api.client.googleapis.json.googlejsonresponseexception.from(googlejsonresponseexception.java:145) @ com.google.api.client.googleapis.services.json.abstractgooglejsonclientrequest.newexceptiononerror(abstractgooglejsonclientrequest.java:113) @ com.google.api.client.googleapis.services.json.abstractgooglejsonclientrequest.newexceptiononerror(abstractgooglejsonclientrequest.java:40) @ com.google.api.client.googleapis.services.abstractgoogleclientrequest$1.interceptresponse(abstractgoogleclientrequest.java:321) @ com.google.api.client.http.httprequest.execute(httprequest.java:1049) @ com.google.api.client.googleapis.services.abstractgoogleclientrequest.executeunparsed(abstractgoogleclientrequest.java:419) @ com.google.api.client.googleapis.services.abstractgoogleclientrequest.executeunparsed(abstractgoogleclientrequest.java:352) @ com.google.api.client.googleapis.services.abstractgoogleclientrequest.execute(abstractgoogleclientrequest.java:469) @ com.google.cloud.dataflow.sdk.runners.worker.dataflowworkerharness$dataflowworkunitclient.reportworkitemstatus(dataflowworkerharness.java:273) @ com.google.cloud.dataflow.sdk.runners.worker.dataflowworker.reportstatus(dataflowworker.java:209) @ com.google.cloud.dataflow.sdk.runners.worker.dataflowworker.dowork(dataflowworker.java:157) @ com.google.cloud.dataflow.sdk.runners.worker.dataflowworker.getandperformwork(dataflowworker.java:95) @ com.google.cloud.dataflow.sdk.runners.worker.dataflowworkerharness$workerthread.call(dataflowworkerharness.java:139) @ com.google.cloud.dataflow.sdk.runners.worker.dataflowworkerharness$workerthread.call(dataflowworkerharness.java:124) @ java.util.concurrent.futuretask.run(futuretask.java:266) @ java.util.concurrent.threadpoolexecutor.runworker(threadpoolexecutor.java:1142) @ java.util.concurrent.threadpoolexecutor$worker.run(threadpoolexecutor.java:617) @ java.lang.thread.run(thread.java:745) 9:22:51 failed task going retried.
we've had success job parallelization in response this, wondering if there hard limit or quota somewhere being run into. retried tasks inevitably fail after maximum # of retries reached causing job fail.
thanks!
sal
the dataflow service has been updated handle larger requests of type, , should no longer produce issue.
Comments
Post a Comment