Hadoop fs -get only copies specific files

Is there a way to copy only certain files, for example, based on the file type using fs -get or fs -copyToLocal? Note. I would like this to be recursive and cross the entire cluster.

Thought, but I can not answer my question.

Here's how we did it. Just wrote a quick shell script.

mkdir /tmp/txt

    for F in `hadoop fs -fs hdfs://namenode.mycluster -lsr / | grep '/*.txt$' | awk '{print $NF}'; 
    do
       hadoop fs -fs hdfs://namenode.mycluster -copyToLocal $F /tmp/las/ 
    done
+3
source share
3 answers

Here's how we did it. Just wrote a quick shell script.

LOCAL_DIR=/tmp/txt
mkdir $LOCAL_DIR

for F in `hadoop fs -fs hdfs://namenode.mycluster -lsr / | grep '/*.txt$' | awk '{print $NF}'; 
do
   hadoop fs -fs hdfs://namenode.mycluster -copyToLocal $F $LOCAL_DIR 
done`
+4
source

You can give a regular expression to copy files. here is an example of using the command line in hadoop. This does not use get, but uses put, which should behave just like get.

Something like that: hadoop fs -get out/*

http://prazjain.wordpress.com/2012/02/15/how-to-run-hadoop-map-reduce-program-from-command-line/

+1

Hadoop , :

hadoop fs -get /**/*.txt /tmp

However, you can write your own code for this - take a look at the current source for FsShell and connect it to the FileInputFormat listStatus method, which you can configure to accept PathFilter. In this PathFilter, you can return true only if Path has the desired file type.

+1
source

All Articles