基于Ajax技术的搜索引擎研究

英文题名：The Research of Search Engine Based on Ajax
作者：肖卓磊
论文级别：硕士
学科专业名称：电路与系统
中文关键词：搜索引擎 ; Ajax ; 网络蜘蛛 ; PageRank
英文关键词：Search Engine ; Ajax ; Web Spider ; PageRank
学位年度：2009
导师：周云耀
学科代码：080902
学位授予单位：武汉理工大学
论文提交日期：2009-05-01
答辩委员会主席：刘泉

摘要

搜索引擎(Search Engine)是随着Web信息的迅速增加,从1995年开始逐渐发展起来的技术。搜索引擎以一定的策略在互联网中搜集、发现信息,并对信息进行理解、提取、组织和处理,为用户提供检索服务,从而起到信息导航的作用。因而搜索引擎技术成为计算机工业界和学术界争相研究、开发的对象。
     搜索引擎是对网络上的信息项进行表示、存储、组织和存取。利用搜索引擎能够查找数量庞大的网络信息,并可以迅速查到未知信息。搜索引擎是互联网信息检索技术的核心。目前,Internet上广泛使用的包括中文在内的搜索引擎已不下十几种,比如以Google为首的通用搜索引擎,和各类以行业来划分的垂直式的网络搜索工具。然而,中文搜索引擎与国外的同类产品相比却还存在着很多问题,如覆盖率低、查准率不高、检索精度差、更新速度慢、无法控制网络信息的动态变化、对信息内容难于控制和管理等。
     本文分析了搜索引擎的历史与现状,针对目前搜索引擎存在的一些弱点,从新兴的Ajax技术出发,用XML数据与Ajax技术相结合,构建基于Ajax的搜索引擎。搜索引擎以高效服务作为重要的衡量标准,在Ajax技术下,搜索引擎不但可以保证服务质量,还能进一步提高搜索引擎的可用性。与传统的搜索引擎一次性“请求-响应”模式不同,基于Ajax技术的异步搜索引擎对服务器的数据请求可以分成多步完成。Ajax引擎先从服务器请求样式表、控制代码及最关键数据并显示在浏览器中,JavaScript在不打断用户操作的情况下,控制XMLHttpRequest对象在后台继续从服务器请求更多数据,并获取目标网页的当前状态,同时负责操作DOM替换页面中的数据部分。用户无需干预和等待就可以继续浏览更丰富的内容,整个过程页面只调整内容显示,而不刷新页面。
     本文改进了PageRank算法,加入了页面去重处理,使得搜索引擎更加快速。同时,本文通过在网络蜘蛛中加入JS解析器,通过截取Ajax异步请求返回的数据并分析,从而获取更多的页面内容。
With the running up of Web information, Search Engine as a kind of new technology has been developed gradually since 1995.That is extremely difficult for users to look for information in vast data ocean. So Search Engine is the exact technology appearing in order to solve this problem. It finds and collects information by a certain devices, and then comprehends abstracts, organizes and handles these pieces of information. It also serves as information navigation. Therefore, Search Engine technology has become the studying and developing target of the industrial and academic field of computer.
     Search Engine is made to achieve the goat of storage, organization and access of the information items on the network. Search Engine can be used to locate a huge number of information on network as well as some unknown information can be quickly found. Internet search engine is the core of information retrieval technology. At present, Search Engine that is widely used on the Internet, including the Chinese search engines, has no less than ten, such as Google, head of the general purposes search engine, and various types of vertical Web search tools based on industry classification. However, there are still a long distance between the Chinese search engine and its similar products abroad, such as low coverage, the not high correction rate, the bad accuracy of retrieval, slow updating, unable to controlling the dynamic changes of network information, and difficulty on control and management of the content.
     The history and present status of search engine is analyzed. In accordance with the weakness existed in current search engine, the Ajax based search engine is constructed from the perspective of Ajax techniques with combination of XML data sets. Traditional web service is based on Request-Response mode just only once. Under Ajax technology, the Client could request data from Server one more times. After download the necessary data and main control code from the Server, user can read the main information, but the transmission is continued in backstage, the Ajax engine keeps on send requests to the server in order get more information. When new information arrived, old data in the WEB page would be replaced by DOM without refresh screen.
     It makes Search Engine more quickly by improving the PageRank algorithm and eliminating the duplicated pages in the index. And Search Engine can get more information form web pages through culling the dates of Ajax.

引文

[1]Andruss,P.Search & deliver[search engine marketing].Marketing News Volume:41 Issue:17 Pages:18-21 Published:15 October 2007
    [2]易丹.搜索引擎优化(SEO)的探索与实现.石油工业计算机应用2006年01期
    [3]Nettleton,D.,Calderon-Benavides,L.,Baeza-Yates,R.Analysis of Web search engine query session and clicked documents.Advances in Web Mining and Web Usage Analysis.8th International Workshop on Knowledge Discovery on the Web,WebKDD 2006.Revised Papers.(Lecture Notes in Artificial Intelligence vol.4811) Pages:207-26 Published:2007
    [4]Jesse James Garrett.Ajax:A New Approach to Web Application[EB/OL].http://www.adaptivepath.com/publications/essays/archives/000385.php,2005.2.18
    [5]王小林,刘宏申.搜索引擎的设计研究.计算机技术与发展2007年02期
    [6]徐宝文,张卫丰.搜索引擎与信息获取技术[M].北京:清华大学出版社,2003
    [7]李开复.互联网搜索引擎未来发展有三大趋势.新华网http://news.xinhuanet.com/internet/2009-04/18/content_11209687.htm,2009.04.18
    [8]刘玲.搜索引擎系统的研究与实现.科学之友(B版)2007年02期
    [9]山岚,赵英,徐耀,王坚,张莹莹.专业搜索引擎系统的设计与实现.微计算机信息2007年06期
    [10]winter.中文搜索引擎技术揭密:系统架构[EB/OL].http://ciocto.e800.com.cn/articles/2004/53/1091788186441_1.html.2004.5.26
    [11]Xing Jiang,Ah-Hwee Tan.OntoSearch:a full-text search engine for the semantic Web.Proceeedings Twenty-First National Conference on Artificial Intelligence(AAAI-06).Eighteenth Innovative Applications of Artificial Intelligence Conference(IAAI-06) Pages:(vol.2) 1325-30 Published:2007
    [12]Li Lei,Zhou Guo-min.Personalized search engine based on Ajax and VSM.Computer Engineering and Applications Volume:43 Issue:19 Pages:89-91,114 Published:1 July 2007
    [13]孙鑫.Java Web开发详解——XML+XSLT+Servlet+JSP深入剖析与实例应用[M].北京:电子工业出版社,2006.4
    [14]高清霞.中文智能搜索引擎的设计与实现.北京工业大学
    [15]曹安庆,于兆鑫.网络搜索引擎的研究.陇东学院学报(自然科学版)2006年02期
    [16]陈丹,郭伟青.信息搜索引擎综述及系统架构设计.商场现代化2008年03期
    [17]彭涛.面向专业搜索引擎的主题爬行技术研究.吉林大学
    [18]易丹.搜索引擎优化(SEO)的探索与实现.石油工业计算机应用2006年01期
    [19]何绍华,孙琛.搜索引擎的标准化研究.中国索引2006年03期
    [20]Yi Lin.A hardware-accelerated patch search engine for image completion.2006 IEEE Conference on Systems,Man,and Cybernetics Pages:3949-54 Published:2006
    [21]杨海东,叶小岭.搜索引擎中无效链情况分析及对策.淮阴师范学院学报(自然科学版)[J],2007(1)
    [22]岳清.浅析搜索引擎的原理及发展前景.大众科技2005年05期
    [23]LEO.COM中文版.Google工作流程.http://www.loveseo.com/google-Work.asp
    [24]陈治平.智能搜索引擎理论与应用研究.湖南大学
    [25]刘莉,肖诗斌,王涛;施水才.基于RSS的分布式博客搜索引擎设计.中国中文信息学会
    [26]杨海东,张莉.PageRank技术分析与搜索引擎检索效率研究.淮阴师范学院学报(自然科学版)[J].2004(3).
    [27]程彩风,杜友福.搜索引擎技术分析.科技信息2007年01期
    [28]王兵,许少华,张兴旺.基于改进PageRank算法的管道专业搜索引擎系统设计与实现.大庆石油学院学报2007年01期
    [29]刘佳.中文搜索引擎的设计与实现.东华大学2008年
    [30]Srihari;Chen Huang;Srinivasan.Search engine for handwritten documents.Proceedings of the SPIE - The International Society for Optical Engineering Volume:5676 Issue:1 Pages:66-75 Published:17 January 2005
    [31]冯斌.基于Lucene小型搜索引擎的研究与实现.武汉理工大学2008年
    [32]陈金森,原福永,张园园.XML搜索引擎研究.图书情报工作2007年01期
    [33]罗冰.支持Ajax的互联网搜索引擎爬虫设计与实现.浙江大学2007年
    [34]李蕾;周国民.基于Ajax与向量空间模型的个性化搜索引擎.计算机工程与应用2007年19期
    [35]Andruss,P.Search & deliver[search engine marketing].Marketing News Volume:41 Issue:17 Pages:18-21 Published:15 October 2007
    [36]李刚,宋伟,邱哲.征服Ajax+Lucene构建搜索引擎[M].北京:人民邮电出版社,2006
    [37]Chen De-li.Research and design of hierarchical peer-to-peer search engine framework based on JXTA.Journal of Chongqing Institute of Technology Volume:21 Issue:7 Pages:139-43Published:May 2007
    [38]刘刚,于力超.搜索引擎中网络蜘蛛的设计与实现.电脑与信息技术2007年04期
    [39]King,J.D.,Yuefeng Li,Xiaohui Tao,et al.Mining world knowledge for analysis of search engine content.Web Intelligence and Agent Systems Volume:5 Issue:3 Pages:233-53Published:2007
    [40]陈治平.智能搜索引擎理论与应用研究.湖南大学

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700