基于Scrapy的商务网站数据抓取
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Scrapy-based Business Website Data Capture
  • 作者:任洛
  • 英文作者:Ren Luoyi;Department of Computer, Chengdu College of University of Electronic Science and Technology of China;
  • 关键词:数据抓取 ; Python ; Scrapy
  • 英文关键词:Data crawling;;python;;Scrapy
  • 中文刊名:XXDL
  • 英文刊名:China Computer & Communication
  • 机构:电子科技大学成都学院计算机系;
  • 出版日期:2018-10-15
  • 出版单位:信息与电脑(理论版)
  • 年:2018
  • 期:No.413
  • 语种:中文;
  • 页:XXDL201819025
  • 页数:2
  • CN:19
  • ISSN:11-2697/TP
  • 分类号:61-62
摘要
在大数据时代,商业网站竞争往往是数据竞争,需要获取海量的数据,网络爬虫技术应运而生。笔者介绍了网络爬虫的工作原理和主要工作流程,阐述了Python语言中为网络爬虫提供服务的主要第三方库,然后系统地介绍了Scrapy框架,详细阐述了该架构的主要部分和配置流程,然后描述了如何使用SCRAPY命令行进行数据爬取。该方法逻辑清晰,在工程上具有较强的操作性。
        In the era of large data, the competition of commercial websites is often the competition of data, which requires the acquisition of large amounts of data. Web crawler technology arises at the historic moment. This paper introduces the working principle and main workflow of network crawler, expounds the main third-party libraries that provide services for network crawler in Python language, then introduces the Scrapy framework systematically, elaborates the main parts and configuration flow of the framework in detail, and then describes how to use SCRAPY command line for data processing. Climb. The logic is clear and the method is highly operational in engineering.
引文

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700