Now I plan to use scrapy in a more distributed approach, and I'm not if the spiders / pipelines / bootloaders / schedulers and the engine are all placed in separate processes or threads, can there be information about this? and can we change the number of processes / threads for each component? I know that there are two settings "CONCURRENT_REQUESTS" and "CONCURRENT_ITEMS", they will determine parallel flows for loaders and pipelines, right? and if I want to deploy spiders / pipelines / bootloaders on different machines, I need to serialize items / requests / answers, right? Appreciate so much for your help!
Thanks Edward.
source
share