Reading Hadoop Maps from Two Different Input Source Files

Question

Reading Hadoop Maps from Two Different Input Source Files

I have a tool that combines many Mappers and Reducers, and at some point I need the results of the merge with the previous steps to reduce the map, for example, as input I have two data files:

/input/a.txt
apple,10
orange,20

*/input/b.txt*
apple;5
orange;40

the result should be c.txt, where c.value = a.value * b.value

/output/c.txt
apple,50   // 10 * 5
orange,800 // 40 * 20

How can I do that? I solved this with a simple Key => MyMapWritable (type = 1,2, value) and merging (actually, multiplying) the data in the reducers. It works, but:

there is a feeling that this can be done easier (it smells not good)
Is it possible to somehow know inside Mapper that the file was used as the record provider (a.txt or b.txt). At the moment, I just used different separators: coma and semicolon: (

+3

mapreduce hadoop

dmytrivv 15 . '12 20:05

2

Chris White · Answer 1 · 2012-07-15T20:16:46+0000

, , CompositeInputFormat , . , mapreduce api, .

-, mapper, context.getInputSplit(), InputSplit, , TextInputFormat, FileInputSplit, getPath() . , CompositeInputFormat, , Writables TupleWritable.

Ashish · Answer 2 · 2013-07-01T07:49:28+0000

String fileName = ((FileSplit) context.getInputSplit()).getPath()
                .toString();

if (fileName.contains("file_1")) {
   //TODO for file 1
} else {
   //TODO for file 2
}

Reading Hadoop Maps from Two Different Input Source Files

More articles: