Hadoop: AWS EMR input and output paths

I am trying to get Hadoop running on Amazon Elastic Mapreduce. I have data and jar located in aws s3. When I set up the task, I pass the JAR arguments as

s3n://my-hadoop/input s3n://my-hadoop/output

Below is my main hadoop function

public static void main(String[] args) throws Exception
    {
        Configuration conf = new Configuration();
        Job job = new Job(conf, "MyMR");
        job.setJarByClass(MyMR.class);
        job.setMapperClass(MyMapper.class);
        job.setReducerClass(CountryReducer.class);
        job.setOutputKeyClass(Text.class);
        job.setOutputValueClass(Text.class);
        job.setInputFormatClass(TextInputFormat.class);
        FileInputFormat.addInputPath(job, new Path(args[0]));
        FileOutputFormat.setOutputPath(job, new Path(args[1]));
        System.exit(job.waitForCompletion(true) ? 0 : 1);
    }

However, my workflow does not work with the following stderr log

Exception in thread "main" java.lang.ClassNotFoundException: s3n://my-hadoop/input
    at java.lang.Class.forName0(Native Method)
    at java.lang.Class.forName(Class.java:247)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:180)

So how do I specify input and output paths in aws emr?

+5
source share
2 answers

So basically this is a classic not-define-the-main-class error when trying to create an executable jar. when you do not allow the bank to know the main class, the first argument is the main class and, therefore, the error is here.

, .

[1] args [2] hadoop :

ruby elastic-mapreduce -j $jobflow --jar s3:/my-jar-location/myjar.jar --arg com.somecompany.MyMainClass --arg s3:/input --arg s3:/output
+2

. , 3 ( 2), jar. - , - , - . , , , , .

+1

All Articles