java - Map task stuck at 50% -
i have mapper , reducer classes input , output values set below.
//reducer job.setoutputkeyclass(longwritable.class); job.setoutputvalueclass(mapperoutput.class); //mapper job.setmapoutputkeyclass(longwritable.class); job.setmapoutputvalueclass(mapperoutput.class); here mapperoutput custom class defined me , implements writable interface.
a part of mapper function below.
public void map(longwritable arg0, text arg1, context context) throws ioexception { try { string tran = null; string ip = arg1.tostring(); system.out.println(ip); bufferedreader br = new bufferedreader(new stringreader(ip)); hsynopsis bdelta = null; hsynopsis b = null, bnew = null; hashentries = (int) math.floor(calculatehashentries()); //hash table size system.out.println("hash entries: "+hashentries); //initialize main hash table , delta hashtable hashtable = new arraylist<>(hashentries); for(int = 0; < hashentries; i++) { hashtable.add(i, null); } deltahashtable = new arraylist<>(hashentries); for(int = 0; < hashentries; i++) { deltahashtable.add(i, null); } while((tran = br.readline())!=null) { createbinaryrep(tran); for(int = 0; < deltahashtable.size(); i++) { bdelta = deltahashtable.get(i); if(bdelta != null) { if(bdelta.nlast_access >= (alpha * transactioncount)) { //transmit bdelta coordinator mapperoutput mp = new mapperoutput(transactioncount, bdelta); context.write(new longwritable(i), mp); //merge bdelta b b = hashtable.get(i); bnew = merge(b,bdelta); hashtable.set(i, bnew); //release bdelta deltahashtable.set(i, null); } } } } } catch(exception e) { e.printstacktrace(); } } my reducer task below.
public void reduce(longwritable index, iterator<mapperoutput> mpvalues, context context) { while(mpvalues.hasnext()) { /*some code here */ } context.write(index, mp); } from code of mapper, algorithm demands, trying send output reducer , when condition satisfied (inside for loop), , mapper after writing context, continues execute loop.
when try run code on single-node hadoop cluster, following log.
15/04/29 03:19:23 warn util.nativecodeloader: unable load native-hadoop library platform... using builtin-java classes applicable 15/04/29 03:19:23 warn mapred.jobclient: use genericoptionsparser parsing arguments. applications should implement tool same. 15/04/29 03:19:23 warn mapred.jobclient: no job jar file set. user classes may not found. see jobconf(class) or jobconf#setjar(string). 15/04/29 03:19:23 info input.fileinputformat: total input paths process : 2 15/04/29 03:19:23 warn snappy.loadsnappy: snappy native library not loaded 15/04/29 03:19:24 info mapred.jobclient: running job: job_local599819429_0001 15/04/29 03:19:24 info mapred.localjobrunner: waiting map tasks 15/04/29 03:19:24 info mapred.localjobrunner: starting task: attempt_local599819429_0001_m_000000_0 15/04/29 03:19:24 info util.processtree: setsid exited exit code 0 15/04/29 03:19:24 info mapred.task: using resourcecalculatorplugin : org.apache.hadoop.util.linuxresourcecalculatorplugin@74ff364a 15/04/29 03:19:24 info mapred.maptask: processing split: file:/home/pooja/adm/frequentpatternmining/input/file.dat~:0+24 15/04/29 03:19:24 info mapred.maptask: io.sort.mb = 100 15/04/29 03:19:24 info mapred.maptask: data buffer = 79691776/99614720 15/04/29 03:19:24 info mapred.maptask: record buffer = 262144/327680 15/04/29 03:19:24 info mapred.maptask: starting flush of map output 15/04/29 03:19:24 info mapred.maptask: starting flush of map output 15/04/29 03:19:25 info mapred.jobclient: map 0% reduce 0% 15/04/29 03:19:30 info mapred.localjobrunner: 15/04/29 03:19:31 info mapred.jobclient: map 50% reduce 0% the map task has stuck @ 50% , doesn't proceed.
when run map function separately (not in hadoop), not having problem of infinite loop.
can please me this?
edit 1: input file in orders of kb. causing problem distribution of data mappers?
edit 2: mentioned in answer, changed iterator iterable. still map gets stuck @ 100% , after time restarts.
i see following in jobtracker log:
2015-04-29 13:26:28,026 info org.apache.hadoop.mapred.taskinprogress: error attempt_201504291300_0003_m_000000_0: task attempt_201504291300_0003_m_000000_0 failed report status 600 seconds. killing! 2015-04-29 13:26:28,026 info org.apache.hadoop.mapred.jobtracker: removing task 'attempt_201504291300_0003_m_000000_0'
you have mistakenly used iterator in reduce function instead of iterable .
you need use iterable using new map reduce api's, because reduce(object, iterable, org.apache.hadoop.mapreduce.reducer.context)
method called each in sorted inputs.
Comments
Post a Comment