I have a small cluster of 5 systems to crawl few websites from web. Apache nutch 2.3.1 in configured on master. There are 4 workers. Most configuration of hadoop is left default. Each node has total 16 GB memory. While running a job I have observed following error and job was failed.
2019-04-09 23:04:06,732 INFO [main] org.apache.gora.mapreduce.GoraRecordWriter: Flushing the datastore after 5590000 records
2019-04-09 23:07:27,944 ERROR [main] org.apache.hadoop.mapred.YarnChild: Error running child : java.lang.OutOfMemoryError: GC overhead limit exceeded
at java.util.Arrays.copyOf(Arrays.java:3181)
at java.util.ArrayList.grow(ArrayList.java:261)
at java.util.ArrayList.ensureExplicitCapacity(ArrayList.java:235)
at java.util.ArrayList.ensureCapacityInternal(ArrayList.java:227)
at java.util.ArrayList.add(ArrayList.java:458)
at org.apache.hadoop.hbase.client.MultiAction.add(MultiAction.java:76)
Now the question is what and where I should updated heap to get rid of this issue. Is it datanode issue or mapreduce issue.