How to run scrapy with py file

Hi, I’m working on scrapy, I created a scrapy c folder scrapy startproject example and a written spider to clear all the data from the URL, and I launched the spider using the command scrapy crawl spider_name, its operability and the ability to receive data.

But I had a requirement that I need to run scrapy with one spider file created. I mean a single file with a file similar to

python -u /path/to/spider_file_inside_scrapy_folder_created.py

Is it possible to start a spider without a command scrapy crawlafter creating the project project folder with the spider.py file

+5
source share
5 answers

Try the runspider command :

scrapy runspider /path/to/spider_file_inside_scrapy_folder_created.py
+2
source

! Popen, :

>>> from scrapy.cmdline import execute
>>> execute(['scrapy','crawl','dmoz'])

, . , Github :

https://github.com/scrapy/dirbot

+10

I think the answer (if I understand your question) now relates to using the API :

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider(scrapy.Spider):
    # Your spider definition
    ...

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(MySpider)
process.start()
+2
source

your spider class file:

class YouNameSpider(scrapy.Spider):
    name = 'youname'
    allowed_domains = ['https://www.YouName.com']

create main.py in YouName:

from scrapy.cmdline import execute
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
execute(['scrapy', 'crawl', 'youname'])
0
source

Yes, you can first get to the destination where the xyz.py file is on the command line. Then you can write a command:

scrapy runspider xyz.py

And if you want to save the output, you can write:

scrapy runspider xyz.py -o output.csv

Or you can save the output in json as well

0
source

All Articles