I'm looking for a tutorial in web crawlers

I need to know more about web crawlers for my personal project, and I would like to get answers to some questions:

1) From what I heard, it looks like Google uses python for its web crawlers, right?

2) Following this question, will you say that this is a good choice? Is Python the most appropriate language for this kind of thing? Why?

3) What is the legal action with web scanners and what is not? I heard that many websites do not really appreciate that you load too many of their pages, but is that not what Google does? It looks like a big gray space, and I would like to know how I can make sure that I am doing this legally ...

4) If you have a good guide on creating web scanners (a programming language is not important), I would really appreciate a link to it!

Thank you, and sorry for the mistakes, English is not my native language ...

+5
source share
1 answer

1) From what I heard, it looks like Google uses python for its web-crawlers, is that correct?

An early version of Google used Python for a web browser. This is indicated in early publications dating back to the 90s (see Anatomy of the search engine .) Only a Google employee can tell you if they continue to use Python today for their crawler.

2) Following this question, will you say that this is a good choice? Is Python the most suitable language for this kind of thing? Why?

. " Python ?" Python - , .., Python, , Python -, - URL-.

3) -, ? - , , , Google ? , , , , ...

, . - , , , IP-. -, GoogleBot, , . - .

4) , - ( ), !

- - URL- FIFO. URL-, , URL-, HTML- , . URL- - .

webcrawler , , , URL-, , , .. , - , , - World Wide Web .

+11

All Articles