How to use class generated by sqoop in MapReduce?

The sqoop query generates a java file containing a class that contains code to access mapreduce for column data for each row. (Sqoop import was performed in the text without the -as-sequencefile option and with 1 row per record and commas between columns) But how do we actually use it?

I found a public parse () method in this class that accepts text as input and populates all members of the class, so for practice I changed the wordcount application to convert a string of text from TextInputFormat to mapper to instnace of the class created by sqoop. But that calls "unreported exception.com.cloudera.sqoop.lib.RecordParser.ParseError; needs to be caught or thrown" when I call the parse () method.

Is it possible to do this or is it a custom InputFormat needed to fill the class with data from each record?

+5
source share
1 answer

Well, that seems obvious once you find out, but as a Java beginner, it can take some time.

Set up your project first: just add the created sqoop.java file to the source folder. I use eclipse to import to the source folder of my class.

Then just make sure that you have correctly configured the build path of the java project:

Add the following jar files in the project properties / build path / java library / add an external jar: (for hadoop cdh4 +):

/usr/lib/hadoop/hadoop-common.jar 
/usr/lib/hadoop-[version]-mapreduce/hadoop-core.jar
/usr/lib/sqoop/sqoop-[sqoop-version]-cdh[cdh-version].jar

Then adapt the mapreduce source code: First configure it:

public int run(String [] args) throws exception
{
 Job job = new Job(getConf());
 job.setJarByClass(YourClass.class);
 job.setMapperClass(SqoopImportMap.class);
 job.setReducerClass(SqoopImprtReduce.class);

 FileInputFormat.addInputPath((job,"hdfs_path_to_your_sqoop_imported_file"));
 FileOutputFormat.setOutputPath((job,"hdfs_output_path"));

 // I simply use text as output for the mapper but it can be any class you designed
 // as long as you implement it as a Writable
 job.setMapOutputKeyClass(Text.Class);
 job.setMapOutputValueClass(Text.Class);

 job.setOutputKeyClass(Text.Class);
 job.setOutputValueClass(Text.Class);
 ...

. , Java sqoop Sqimp.java: , , : id, name, age mapper :

 public static class SqoopImportMap
 extends Mapper<LongWritable, Text, Text, Text> 
 {

 public void map(LongWritable k, Text v, Context context)
 {
  Sqimp s = new Sqimp(); 
  try
  {
  // this is where the code generated by sqoop is used.
  // it automatically casts one line of the imported data into an instance of the generated class, 
  // to let you access the data inside the columns easily
   s.parse(v);
  } 
  catch(ParseError pe) {// do something if there is an error.}

  try
  {
   // now the imported data is accessible:
   // e.g
   if (s.age>30)
   {
    // submit the selected data to the mapper output as a key value pair.
    context.write(new Text(s.age),new Text(s.id));
   }
  }
  catch(Exception ex)
  {//do something about the error}
 }
}
+4

All Articles