Amazon EMR: Passing XML File or Properties to JAR

I performed several tasks with decreasing the number of cards in the hadoop cluster from one JAR file. Main of JAR accepts an XML file as a command line parameter. The XML file contains input and output paths for each job (a pair of name and value properties), and I use them to configure each mapreduce job. I can load paths into the configuration like this:

    Configuration config = new Configuration(false);
    config.addResource(new FileInputStream(args[0]));

Now I am trying to start the JAR using Amazon Elastic MapReduce. I tried loading the XML file into S3, but of course using FileInputStream to load path data from S3 does not work (FileNotFound Exception).

How to transfer XML file to JAR when using EMR?

(I looked at the bootstrap actions, but as far as I can tell, to indicate configurations specific to hadoop).

Any insight would be appreciated. Thank.

+3
source share
1 answer

If you added a simple bootstrap action that does

hadoop fs -copyToLocal s3n://bucket/key.xml /target/path/on/local/filesystem.xml

you can open FileInputStream in /target/path/on/local/filesystem.xml, as you expected. The bootstrap action is executed simultaneously on all the master / slave machines in the cluster, so they will all have a local copy.

To add this boot action, you need to create a shell script file that contains the above command, upload it to S3, and specify w370> as the action path for loading it. Unfortunately, the shell script in s3 is currently the only valid type of bootstrap action.

+4
source

All Articles