Copying files from amazon s3 to hdfs using s3distcp fails

I am trying to copy files from s3 to hdfs using a workflow in EMR, and when I run the command below, the workflow starts up successfully, but gives me an error when it tries to copy a file to HDFS. Do I need to set any file permission input?

Team:

./elastic-mapreduce --jobflow j-35D6JOYEDCELA --jar s3: //us-east-1.elasticmapreduce/libs/s3distcp/1.latest/s3distcp.jar --args' --src, s3: // odsh / input /, - Dest, HDFS: /// Users

Output

Task TASKID = "task_201301310606_0001_r_000000" TASK_TYPE = "REDUCE" TASK_STATUS = "FAILED" FINISH_TIME = "1359612576612" ERROR = "java.lang.RuntimeException: Reducer task could not copy 1 file: s011.01121_01101_01101_01101_011_011 etc. at com.amazon.external.elasticmapreduce.s3distcp.CopyFilesReducer.close (CopyFilesReducer.java:70) at org.apache.hadoop.mapred.ReduceTask.runOldReducer (ReduceTask.java∗38) at org.apache.hadoop .mapred.ReduceTask.run (ReduceTask.java:429) at org.apache.hadoop.mapred.Child $ 4.run (Child.java:255) in java.security.AccessController.doPrivileged (native method) in javax.security. auth.Subject.doAs (Subject.javahaps96) at org.apache.hadoop.security.UserGroupInformation.doAs (UserGroupInformation.java:1132) at org.apache.hadoop.mapred.Child.main (Child.java: 249)

+5
3

. , , CopyFilesReducer CopyFilesRunable S3. , temp , , . , , , .

AWS, , , s3DistCp.copyfiles.mapper.numWorkers 1 .

+6

; s3distcp / . ( -D mapred.child.java.opts=-Xmx1024m) .

:

hadoop jar /home/hadoop/lib/emr-s3distcp-1.0.jar 
    -D mapred.child.java.opts=-Xmx1024m 
    --src s3://source/
    --dest hdfs:///dest/ --targetSize 128
    --groupBy '.*\.([0-9]+-[0-9]+-[0-9]+)-[0-9]+\..*' 
    --outputCodec gzip
+2

I see the same problem caused by the state of the race. Transmission -Ds3DistCp.copyfiles.mapper.numWorkers=1helps to avoid a problem.

I hope Amazon fixes this error.

+1
source

All Articles