Download all .tar.gz files from website / directory using WGET

So, I'm trying to create an alias / script to download all specific extensions from a website / directory using wget, but I feel that there should be an easier way than what I came up with.

Currently, the code I came up with from google search and manual pages:

wget -r -l1 -nH --cut-dirs=2 --no-parent -A.tar.gz --no-directories http://download.openvz.org/template/precreated/

So, in the above example, I am trying to download all the .tar.gz files from the catalog of pre-processed OpenVZ templates.

The above code works correctly, but I need to manually specify -cut-dirs = 2, which would cut out the / template / precreated / directory structure that was usually created and also download the robots.txt file.

Now this is not necessarily a problem, and it is easy to delete the robots.txt file, but I was hoping I would just skip something on the manual pages that would allow me to do the same without specifying the directory structure from ...

Thanks for any help in advance, this is much appreciated!

+5
source share
2 answers

Use option -R

-R robots.txt,unwanted-file.txt

as a rejected list of files you don't want (separated by commas).

Regarding the scenarios:

URL=http://download.openvz.org/template/precreated/
CUTS=`echo ${URL#http://} | awk -F '/' '{print NF -2}'`
wget -r -l1 -nH --cut-dirs=${CUTS} --no-parent -A.tar.gz --no-directories -R robots.txt ${URL}

This should work based on the subdirectories of your URL.

+6
source

I would suggest that if this is really annoying, and you need to do it a lot, just write a really short two-line script to remove it for you:

wget -r -l1 -nH --cut-dirs=2 --no-parent -A.tar.gz --no-directories http://download.openvz.org/template/precreated/
rm robots.txt
+2
source

All Articles