During the reduce phase of my MapReduce program, the only operation I perform is to execute each value in the Iterator provided, as shown below:
public void reduce(Text key, Iterator<text> values,
OutputCollector<Text, Text> output, Reporter reporter) {
Text next;
Text outKey = new Text()
Text outVal = new Text();
StringBuilder sb = new StringBuilder();
while(values.hasNext()) {
next = values.next();
sb.append(next.toString());
if (values.hasNext())
sb.append(',');
}
outKey.set(key.toString());
outVal.set(sb.toSTring());
output.collect(outKey,outVal);
}
My problem is that some of the decreasing output values are huge lines of text; so large that even with a very large initial size, the string buffer must increase (double) its size several times to accommodate the entire iterator context, causing a memory problem.
Java , . - Hadoop? HDFS ( )? , - , output.collect?
. / . , , /, SringBuilder, .