Hadoop - how to use and reduce the amount of input?

Mapper/Reducer 1 --> (key,value)
                      /   |   \
                     /    |    \
     Mapper/Reducer 2     |    Mapper/Reducer 4
     -> (oKey,oValue)     |    -> (xKey, xValue)
                          |
                          |
                    Mapper/Reducer 3
                    -> (aKey, aValue)

I have a log file that I combine with MR1. Mapper2, Mapper3, Mapper4 outputs the output of MR1 as its input. Assignments are tied.

MR1 Output:

User     {infos of user:[{data here},{more data},{etc}]}
..

MR2 Output:

timestamp       idCount
..

MR3 Output:

timestamp        loginCount
..

MR4 Output:

timestamp        someCount
..

I want to combine outputs from MR2-4: Final output ->

timestamp     idCount     loginCount   someCount
..
..
..

Is there a way without pigs or a hive? I am using Java.

+5
source share
2 answers

You can do this with MultipleInputs see sample here

+1
source

As far as I know, you cannot have an array of output in the gearbox class. What comes to my mind to solve your problem is this:

MR1 {a,b,c} - {timestamp,idCount} {timestamp, loginCount} {timestamp, someCount} . MR2-4.

, :

MR1 <inputKey,inputValue,outputKey,outPutValue> where outputKey is 
                                       "a" for outValue`{timestamp,idCount}
                                       "b" for outValue`{timestamp, loginCount} 
                                       "c" for outValue`{timestamp, someCount} 

MR2-4<inputKey,inputValue,outputKey,outPutValue> if inputkey is "a" do MR2
                                                 if inputkey is "b" do MR3
                                                 if inputkey is "c" do MR4

, Partitioner and GroupComperator, {key/value}, mapper/reducer key+some_part_of_value .

+1

All Articles