How to generate multiple file names at runtime in HADOOP?

I have some data in csv format.

e.g. K1, K2, data1, data2, data3

here my cartographer sends the key to the gearbox as K1K2 and the Value as data1, data2, data3

I wanted to save this data in several files with the file name K1k2 (or the key that the reducer receives). Now, if I use the MultipleOutputs class, I must mention the file names before starting the display. But here, since only after reading the data from the cartographer, I can determine the key. How can I continue?

PS I'm new to this.

+3
source share
2 answers

You can create file names and pass them to MultipleOutputs in the reducer, for example:

public void setup(Context context) {
   out = new MultipleOutputs(context);
   ...
}

public void reduce(Text key, Iterable values, Context context) throws IOException,           InterruptedException {
  for (Text t : values) {
    out.write(key, t, generateFileName(<parameter list...>));
    // generateFileName is your function
  }
}

protected void cleanup(Context context) throws IOException, InterruptedException {
  out.close();
}

MultipleOutputs: https://hadoop.apache.org/docs/current2/api/org/apache/hadoop/mapreduce/lib/output/MultipleOutputs.html

0

. MultipleOutputs .

public class YourReducer extends Reducer<Text, Value, Text, Value> {
private Value result = null;
private MultipleOutputs<Text,Value> out;

 public void setup(Context context) {
   out = new MultipleOutputs<Text,Value>(context);    
 }
public void reduce(Text key, Iterable<Value> values, Context context)
        throws IOException, InterruptedException {
    // do your code
    out.write(key, result,"outputpath/"+key.getText());                
}
public void cleanup(Context context) throws IOException,InterruptedException {
    out.close();        
 }

}

outputpath/K1
          /K2
          /K3
 .......

LazyOutputFormat.setOutputFormatClass() FileOutputFormat. job.setOutputFormatClass(NullOutputFormat.class). , FileOutputFormat.setOutputPath() FileOutputFormat.setOutputPath(), .

-1

All Articles