基于Scrapy的商务网站数据抓取

英文篇名：Scrapy-based Business Website Data Capture
作者：任洛漪
英文作者：Ren Luoyi;Department of Computer, Chengdu College of University of Electronic Science and Technology of China;
关键词：数据抓取 ; Python ; Scrapy
英文关键词：Data crawling;;python;;Scrapy
中文刊名：XXDL
英文刊名：China Computer & Communication
机构：电子科技大学成都学院计算机系;
出版日期：2018-10-15
出版单位：信息与电脑(理论版)
年：2018
期：No.413
语种：中文;
页：XXDL201819025
页数：2
CN：19
ISSN：11-2697/TP
分类号：61-62

摘要

在大数据时代,商业网站竞争往往是数据竞争,需要获取海量的数据,网络爬虫技术应运而生。笔者介绍了网络爬虫的工作原理和主要工作流程,阐述了Python语言中为网络爬虫提供服务的主要第三方库,然后系统地介绍了Scrapy框架,详细阐述了该架构的主要部分和配置流程,然后描述了如何使用SCRAPY命令行进行数据爬取。该方法逻辑清晰,在工程上具有较强的操作性。
In the era of large data, the competition of commercial websites is often the competition of data, which requires the acquisition of large amounts of data. Web crawler technology arises at the historic moment. This paper introduces the working principle and main workflow of network crawler, expounds the main third-party libraries that provide services for network crawler in Python language, then introduces the Scrapy framework systematically, elaborates the main parts and configuration flow of the framework in detail, and then describes how to use SCRAPY command line for data processing. Climb. The logic is clear and the method is highly operational in engineering.

引文

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700