How to run scrapy with py file

Question

How to run scrapy with py file

Hi, I’m working on scrapy, I created a scrapy c folder scrapy startproject example and a written spider to clear all the data from the URL, and I launched the spider using the command scrapy crawl spider_name, its operability and the ability to receive data.

But I had a requirement that I need to run scrapy with one spider file created. I mean a single file with a file similar to

python -u /path/to/spider_file_inside_scrapy_folder_created.py

Is it possible to start a spider without a command scrapy crawlafter creating the project project folder with the spider.py file

+5

python scrapy

shiva krishna Sep 29 '12 at 4:17

source share

5 answers

! Popen, :

>>> from scrapy.cmdline import execute
>>> execute(['scrapy','crawl','dmoz'])

, . , Github :

https://github.com/scrapy/dirbot

+10

damzam 29 . '12 6:28

I think the answer (if I understand your question) now relates to using the API :

import scrapy
from scrapy.crawler import CrawlerProcess

class MySpider(scrapy.Spider):
    # Your spider definition
    ...

process = CrawlerProcess({
    'USER_AGENT': 'Mozilla/4.0 (compatible; MSIE 7.0; Windows NT 5.1)'
})

process.crawl(MySpider)
process.start()

+2

mikebridge Mar 14 '17 at 2:28

source share

your spider class file:

class YouNameSpider(scrapy.Spider):
    name = 'youname'
    allowed_domains = ['https://www.YouName.com']

create main.py in YouName:

from scrapy.cmdline import execute
import os, sys
sys.path.append(os.path.dirname(os.path.abspath(__file__)))
execute(['scrapy', 'crawl', 'youname'])

0

Ghost clock Oct 16 '17 at 6:38

source share

Yes, you can first get to the destination where the xyz.py file is on the command line. Then you can write a command:

scrapy runspider xyz.py

And if you want to save the output, you can write:

scrapy runspider xyz.py -o output.csv

Or you can save the output in json as well

0

Ashish kapil Oct 16 '17 at 7:51

source share

Steven almeroth · Accepted Answer · 2012-10-13T02:35:51+0000

Try the runspider command :

scrapy runspider /path/to/spider_file_inside_scrapy_folder_created.py

How to run scrapy with py file

More articles: