I am running Mahout 0.6 from the command line in an Amazon Elastic MapReduce cluster, trying to reduce short copies by 500 instances, and the jobs continue to work with the message "Error: Java heap space".
Based on the previous questions here and elsewhere, I launched every memory stick that I can find:
conf / hadoop-env.sh: setting all cumulus spaces up to 1.5 GB on small copies and even 4 GB on large copies.
conf / mapred-site.xml: adding mapred properties. {map, reduce} .child.java.opts and setting their value to -Xmx4000m
$ MAHOUT_HOME / bin / mahout: increase JAVA_HEAP_MAX and set MAHOUT_HEAPSIZE by 6 GB (in large copies).
And the problem remains. I banged my head too much about this - does anyone have any suggestions?
The full command and output look something like this (they are executed in a cluster from large instances, in the hope that this will ease the problem):
hadoop@ip-10-80-202-112:~$ mahout-distribution-0.6/bin/mahout canopy -i sparse-data/2010/tf-vectors -o canopy-out/2010 -dm org.apache.mahout.common.distance.TanimotoDistanceMeasure -ow -t1 0.5 -t2 0.005 -cl
run with heapsize 6000
-Xmx6000m
MAHOUT_LOCAL is not set; adding HADOOP_CONF_DIR to classpath.
Running on hadoop, using HADOOP_HOME=/home/hadoop
No HADOOP_CONF_DIR set, using /home/hadoop/conf
MAHOUT-JOB: /home/hadoop/mahout-distribution-0.6/mahout-examples-0.6-job.jar
12/04/29 19:50:23 INFO common.AbstractJob: Command line arguments: {
12/04/29 19:50:24 INFO common.HadoopUtil: Deleting canopy-out/2010
12/04/29 19:50:24 INFO canopy.CanopyDriver: Build Clusters Input: sparse-data/2010/tf-vectors Out: canopy-out/2010 Measure: org.apache.mahout.common.distance.TanimotoDistanceMeasure@a383118 t1: 0.5 t2: 0.0050
12/04/29 19:50:24 INFO mapred.JobClient: Default number of map tasks: null
12/04/29 19:50:24 INFO mapred.JobClient: Setting default number of map tasks based on cluster size to : 24
12/04/29 19:50:24 INFO mapred.JobClient: Default number of reduce tasks: 1
12/04/29 19:50:25 INFO mapred.JobClient: Setting group to hadoop
12/04/29 19:50:25 INFO input.FileInputFormat: Total input paths to process : 1
12/04/29 19:50:25 INFO mapred.JobClient: Running job: job_201204291846_0004
12/04/29 19:50:26 INFO mapred.JobClient: map 0% reduce 0%
12/04/29 19:50:45 INFO mapred.JobClient: map 27% reduce 0%
[ ... Continues fine until... ]
12/04/29 20:05:54 INFO mapred.JobClient: map 100% reduce 99%
12/04/29 20:06:12 INFO mapred.JobClient: map 100% reduce 0%
12/04/29 20:06:20 INFO mapred.JobClient: Task Id : attempt_201204291846_0004_r_000000_0, Status : FAILED
Error: Java heap space
12/04/29 20:06:41 INFO mapred.JobClient: map 100% reduce 33%
12/04/29 20:06:44 INFO mapred.JobClient: map 100% reduce 68%
[.. REPEAT SEVERAL ITERATIONS, UNITL...]
12/04/29 20:37:58 INFO mapred.JobClient: map 100% reduce 0%
12/04/29 20:38:09 INFO mapred.JobClient: Job complete: job_201204291846_0004
12/04/29 20:38:09 INFO mapred.JobClient: Counters: 23
12/04/29 20:38:09 INFO mapred.JobClient: Job Counters
12/04/29 20:38:09 INFO mapred.JobClient: Launched reduce tasks=4
12/04/29 20:38:09 INFO mapred.JobClient: SLOTS_MILLIS_MAPS=94447
12/04/29 20:38:09 INFO mapred.JobClient: Total time spent by all reduces waiting after reserving slots (ms)=0
12/04/29 20:38:09 INFO mapred.JobClient: Total time spent by all maps waiting after reserving slots (ms)=0
12/04/29 20:38:09 INFO mapred.JobClient: Rack-local map tasks=1
12/04/29 20:38:09 INFO mapred.JobClient: Launched map tasks=1
12/04/29 20:38:09 INFO mapred.JobClient: Failed reduce tasks=1
12/04/29 20:38:09 INFO mapred.JobClient: SLOTS_MILLIS_REDUCES=23031
12/04/29 20:38:09 INFO mapred.JobClient: FileSystemCounters
12/04/29 20:38:09 INFO mapred.JobClient: HDFS_BYTES_READ=24100612
12/04/29 20:38:09 INFO mapred.JobClient: FILE_BYTES_WRITTEN=49399745
12/04/29 20:38:09 INFO mapred.JobClient: File Input Format Counters
12/04/29 20:38:09 INFO mapred.JobClient: Bytes Read=24100469
12/04/29 20:38:09 INFO mapred.JobClient: Map-Reduce Framework
12/04/29 20:38:09 INFO mapred.JobClient: Map output materialized bytes=49374728
12/04/29 20:38:09 INFO mapred.JobClient: Combine output records=0
12/04/29 20:38:09 INFO mapred.JobClient: Map input records=409
12/04/29 20:38:09 INFO mapred.JobClient: Physical memory (bytes) snapshot=2785939456
12/04/29 20:38:09 INFO mapred.JobClient: Spilled Records=409
12/04/29 20:38:09 INFO mapred.JobClient: Map output bytes=118596530
12/04/29 20:38:09 INFO mapred.JobClient: CPU time spent (ms)=83190
12/04/29 20:38:09 INFO mapred.JobClient: Total committed heap usage (bytes)=2548629504
12/04/29 20:38:09 INFO mapred.JobClient: Virtual memory (bytes) snapshot=4584386560
12/04/29 20:38:09 INFO mapred.JobClient: Combine input records=0
12/04/29 20:38:09 INFO mapred.JobClient: Map output records=409
12/04/29 20:38:09 INFO mapred.JobClient: SPLIT_RAW_BYTES=143
Exception in thread "main" java.lang.InterruptedException: Canopy Job failed processing sparse-data/2010/tf-vectors
at org.apache.mahout.clustering.canopy.CanopyDriver.buildClustersMR(CanopyDriver.java:349)
at org.apache.mahout.clustering.canopy.CanopyDriver.buildClusters(CanopyDriver.java:236)
at org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:145)
at org.apache.mahout.clustering.canopy.CanopyDriver.run(CanopyDriver.java:109)
at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:65)
at org.apache.mahout.clustering.canopy.CanopyDriver.main(CanopyDriver.java:61)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.ProgramDriver$ProgramDescription.invoke(ProgramDriver.java:68)
at org.apache.hadoop.util.ProgramDriver.driver(ProgramDriver.java:139)
at org.apache.mahout.driver.MahoutDriver.main(MahoutDriver.java:188)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:39)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:25)
at java.lang.reflect.Method.invoke(Method.java:597)
at org.apache.hadoop.util.RunJar.main(RunJar.java:156)
source
share