Scrapy. How to change spider settings after the start of scanning?

I cannot change the spider settings in the analysis method. But it definitely should be a way.

For instance:

class SomeSpider (BaseSpider):
    name = 'mySpider'
    allowed_domains = ['example.com']
    start_urls = ['http://example.com']
    settings.overrides ['ITEM_PIPELINES'] = ['myproject.pipelines.FirstPipeline']
    print settings ['ITEM_PIPELINES'] [0]
    #printed 'myproject.pipelines.FirstPipeline'
    def parse (self, response):
        # ... some code
        settings.overrides ['ITEM_PIPELINES'] = ['myproject.pipelines.SecondPipeline']
        print settings ['ITEM_PIPELINES'] [0]
        # printed 'myproject.pipelines.SecondPipeline'
        item = Myitem ()
        item ['mame'] = 'Name for SecondPipeline'  

But! The item will be processed by FirstPipeline. The new ITEM_PIPELINES parameter does not work. How to change settings after scanning starts? Thanks in advance!

+5
source share
1 answer

If you want different spiders to have different pipelines, you can set the attribute of the list of pipelines for the spider, which defines the pipelines for this spider. Then in the pipelines check for:

class MyPipeline(object):

    def process_item(self, item, spider):
        if self.__class__.__name__ not in getattr(spider, 'pipelines',[]):
            return item
        ...
        return item

class MySpider(CrawlSpider):
    pipelines = set([
        'MyPipeline',
        'MyPipeline3',
    ])

If you want different elements to be handled by different pipelines, you can do this:

    class MyPipeline2(object):
        def process_item(self, item, spider):
            if isinstance(item, MyItem):
                ...
                return item
            return item
+2
source

All Articles