Python框架下基于主题的数据爬取技术研究与实现

英文篇名：Research and implementation of theme-based data crawling technology with Python framework
作者：严斐 ; 肖璞
英文作者：Yan Fei;Xiao Pu;Sanjiang University;
关键词：数据爬取 ; 基于主题 ; 爬虫 ; SpringMVC
英文关键词：data crawling;;theme-based;;crawler;;Spring MVC
中文刊名：JSJS
英文刊名：Computer Era
机构：三江学院计算机科学与工程学院;
出版日期：2018-11-15
出版单位：计算机时代
年：2018
期：No.317
基金：江苏省高等学校自然科学研究面上项目(17KJD520007)
语种：中文;
页：JSJS201811004
页数：4
CN：11
ISSN：33-1094/TP
分类号：14-17

摘要

如今上网查询和购物已经成为人们的生活必需。由于在很多系统上查看商品或资源需要点击跳转多个页面,随着浏览时间的增加,经常会出现眼花缭乱的感觉。若只为用户呈现必要的数据,必将提高筛选资源的效率。文章使用Python语言结合目前流行的Spring MVC框架来爬取目标网站的数据,设计了数据爬取模块和数据展示模块,实现了基于主题的爬虫框架。通过爬取实验与结果测试,成功爬取到了目标网站的数据并展示到自己的页面上,实现了预期的目标。
Nowadays, online enquiries and shopping have become the indispensable of people's daily life. Because viewing goods or resources on many systems requires clicking and jumping over multiple pages, it is often a dazzling feeling as browsing time increases. If only provide users with the necessary data, the efficiency of screening resources will certainly be improved.Combining with the popular Spring MVC framework, this paper uses Python language to crawl the data of the target website,designs the data crawling module and data display module, and implements the theme-based crawler framework. The crawling experiment and the test result show that, the data of the target website is crawled and displayed on its own page, and the expected goal is achieved.

引文

[1]管华.对当今Python快速发展的研究与展望[J].信息系统工程,2015.12.
    [2]姜杉彪,黄凯林,卢昱江,张俊杰,曾志高,刘强.基于Python的专业网络爬虫的设计与实现[J].企业科技与发展,2016.8.
    [3]孙立伟,何国辉,吴礼发.网络爬虫技术的研究[J].电脑知识与技术,2010.15.
    [4]成功,李小正,赵全军.一种网络爬虫系统中URL去重方法的研究[J].中国新技术新产品,2014.12.
    [5]牛率仁.简析主题网络爬虫搜索策略[J].电脑迷,2016.10.
    [6]陈琳,任芳.基于Python的新浪微博数据爬虫程序设计[J].计算机技术与展,2007.3.
    [7]刘艳平,俞海英,戎沁.Python模拟登录网站并抓取网页的方法[J].微型电脑应用,2015.2.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700