Scrapy – 爬行
Scrapy – 爬行
描述
要执行您的蜘蛛,请在您的first_scrapy目录中运行以下命令–
scrapy crawl first
其中,first是创建蜘蛛时指定的蜘蛛的名称。
蜘蛛爬行后,您可以看到以下输出 –
2016-08-09 18:13:07-0400 [scrapy] INFO: Scrapy started (bot: tutorial) 2016-08-09 18:13:07-0400 [scrapy] INFO: Optional features available: ... 2016-08-09 18:13:07-0400 [scrapy] INFO: Overridden settings: {} 2016-08-09 18:13:07-0400 [scrapy] INFO: Enabled extensions: ... 2016-08-09 18:13:07-0400 [scrapy] INFO: Enabled downloader middlewares: ... 2016-08-09 18:13:07-0400 [scrapy] INFO: Enabled spider middlewares: ... 2016-08-09 18:13:07-0400 [scrapy] INFO: Enabled item pipelines: ... 2016-08-09 18:13:07-0400 [scrapy] INFO: Spider opened 2016-08-09 18:13:08-0400 [scrapy] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Resources/> (referer: None) 2016-08-09 18:13:09-0400 [scrapy] DEBUG: Crawled (200) <GET http://www.dmoz.org/Computers/Programming/Languages/Python/Books/> (referer: None) 2016-08-09 18:13:09-0400 [scrapy] INFO: Closing spider (finished)
正如您在输出中看到的那样,对于每个 URL,都有一个日志行,其中(referer: None)指出 URL 是起始 URL,并且它们没有引用。接下来,您应该会看到在您的first_scrapy目录中创建了两个名为Books.html和Resources.html 的新文件。