Batch rename in hadoop

Question

Batch rename in hadoop

How can I rename all the files in the hdfs directory to have the extension .lzo? .lzo.indexfiles should not be renamed.

For example, this directory listing:

file0.lzo file0.lzo.index file0.lzo_copy_1

can be renamed to:

file0.lzo file0.lzo.index file0.lzo_copy_1.lzo

These files are compressed by lzo, and I need the extension to .lzobe recognized by hasoop.

+5

bash hadoop file-rename

beefyhalo Feb 06 '13 at 18:17

source share

3 answers

, , thi-duong-nguyen , . Java- , , . org.apache.hadoop.fs.FileSystem rename():

Configuration conf = new Configuration();
conf.set("fs.defaultFS", "hdfs://master:8020");
FileSystem dfs = FileSystem.get(conf);
dfs.rename(from, to);

from to org.apache.hadoop.fs.Path. - , ( ), Java.

I , STDIN. 100 ( 7000 !), hdfs dfs -mv 4 100 .

+8

Robert 16 . '14 20:23

We created a utility for mass renaming files in HDFS: https://github.com/tenaris/hdfs-rename . The tool is limited, but if you want, you can contribute to its improvement using the recursive syntax rexx awk, etc.

+1

Ameba spugnosa Aug 4 '16 at 13:00

source share

mt_ · Accepted Answer · 2013-02-06T18:32:52+0000

If you don't want to write Java code for this - I think using the HDFS API command line is the best choice:

mv in hadoop

hadoop fs -mv URI [URI …] <dest>

You can get the paths using a small one liner:

% hadoop fs -ls /user/foo/bar | awk  '!/^d/ {print $8}'

/user/foo/bar/blacklist
/user/foo/bar/books-eng
...

awk will remove directories from the output .. now you can put these files in a variable:

% files=$(hadoop fs -ls /user/foo/bar | awk  '!/^d/ {print $8}')

and rename each file.

% for f in $files; do hadoop fs -mv $f $f.lzo; done

awk . , nolzo. . .

% files=$(hadoop fs -ls /user/foo/bar | awk  '!/^d|nolzo/ {print $8}' )

hadoop echo:

$ for f in $files; do echo $f $f.lzo; done

: awk sed .

"" , , HDFS Java API. , , .

Batch rename in hadoop

More articles: