I am trying to clear data from some sites. But after a while, the web crawler starts to generate a twisted Internet ConnectionLost Error. I do not understand the work twisted. In addition, because of this error, web crawlers continue to work for centuries. I donβt know what makes them work slowly. Please suggest some reasons. My internet connection is ok.
The following is the error:
2014-02-04 14:22:20+0530 [bb] DEBUG: Retrying <GET http://www.bloomberg.com/news
/2014-02-02/romanians-reject-euro-loans-after-hungary-disaster-mortgages.html> (
failed 1 times): [<twisted.python.failure.Failure <class 'twisted.internet.error
.ConnectionLost'>>]
2014-02-04 14:22:20+0530 [bb] INFO: Crawled 20 pages (at 7 pages/min), scraped 0
items (at 0 items/min)
2014-02-04 14:22:57+0530 [bb] DEBUG: Retrying <GET http://www.bloomberg.com/news
/2014-02-03/u-s-said-to-probe-banks-over-sovereign-wealth-fund-deals.html> (fail
ed 1 times): User timeout caused connection failure: Getting http://www.bloomber
g.com/news/2014-02-03/u-s-said-to-probe-banks-over-sovereign-wealth-fund-deals.h
tml took longer than 180 seconds..
2014-02-04 14:22:57+0530 [bb] DEBUG: Retrying <GET http://search1.bloomberg.com/
search/?content_type=all&page=1&q=ROYAL%20BANK%20OF%20CANADA> (failed 1 times):
User timeout caused connection failure: Getting http://search1.bloomberg.com/sea
rch/?content_type=all&page=1&q=ROYAL%20BANK%20OF%20CANADA took longer than 180 s
econds..
2014-02-04 14:22:57+0530 [bb] DEBUG: Retrying <GET http://www.bloomberg.com/news
/2014-02-03/canada-consumer-sentiment-dips-to-8-month-low-on-currency.html> (fai
led 1 times): User timeout caused connection failure.
thanks for the help
source
share