Reading Hadoop Maps from Two Different Input Source Files

I have a tool that combines many Mappers and Reducers, and at some point I need the results of the merge with the previous steps to reduce the map, for example, as input I have two data files:

/input/a.txt
apple,10
orange,20

*/input/b.txt*
apple;5
orange;40

the result should be c.txt, where c.value = a.value * b.value

/output/c.txt
apple,50   // 10 * 5
orange,800 // 40 * 20

How can I do that? I solved this with a simple Key => MyMapWritable (type = 1,2, value) and merging (actually, multiplying) the data in the reducers. It works, but:

  • there is a feeling that this can be done easier (it smells not good)
  • Is it possible to somehow know inside Mapper that the file was used as the record provider (a.txt or b.txt). At the moment, I just used different separators: coma and semicolon: (
+3
2

, , CompositeInputFormat , . , mapreduce api, .

-, mapper, context.getInputSplit(), InputSplit, , TextInputFormat, FileInputSplit, getPath() . , CompositeInputFormat, , Writables TupleWritable.

+3
String fileName = ((FileSplit) context.getInputSplit()).getPath()
                .toString();

if (fileName.contains("file_1")) {
   //TODO for file 1
} else {
   //TODO for file 2
}
+1

All Articles