Download images from website

I want to have a local copy of the gallery on the website. The gallery shows the photos on domain.com/id/1 (the identifier increases in steps of 1), and then the image is saved on pics.domain.com/pics/original/image.format. The exact string that the images in HTML have

<div id="bigwall" class="right"> 
    <img border=0 src='http://pics.domain.com/pics/original/image.jpg' name='pic' alt='' style='top: 0px; left: 0px; margin-top: 50px; height: 85%;'> 
</div>

So, I want to write a script that does something like this (in pseudocode):

for(id = 1; id <= 151468; id++) {
     page = "http://domain.com/id/" + id.toString();
     src = returnSrc(); // Searches the html for img with name='pic' and saves the image location as a string
     getImg(); // Downloads the file named in src
}

I don’t know exactly how to do this. I suppose I could do this in bash, using wget to download html, and then manually to search for html manually for http://pics.domain.com/pics/original/ . then use wget again to save the file, delete the html file, increase id and repeat. The only thing I can’t handle is strings, so if anyone can tell me how to look up the URL and replace * s with the name and file format, I should be able to do the rest. Or, if my method is stupid and you have the best, please share.

+4
source share
2 answers
# get all pages
curl 'http://domain.com/id/[1-151468]' -o '#1.html'

# get all images
grep -oh 'http://pics.domain.com/pics/original/.*jpg' *.html >urls.txt

# download all images
sort -u urls.txt | wget -i-
+24
source

- , -, , :

https://www.example.com/image_gallery/watermark/[id].jpg

, - URL, .

: https://www.example.com/image_gallery/large/[id].jpg

, , , , :

https://www.example.com/image_gallery/[a-zzzzzzzz]/[id].jpg

a zzzzzzzz.

?

0

All Articles