网络服务智能监测平台的研究

英文题名：Research on Intelligent Monitoring Platform of Network Service
作者：薛涛
论文级别：硕士
学科专业名称：通信与信息系统
中文关键词：网络服务 ; 信息采集 ; 聚类 ; 超链接分析 ; PageRank算法
英文关键词：Network service ; Information acquisition ; Clustering ; Hyperlink analysis ; PageRank algorithm
学位年度：2008
导师：孟嗣仪
学科代码：081001
学位授予单位：北京交通大学
论文提交日期：2008-06-01

摘要

随着近些年网络建设和相关技术的飞速发展,以及网络用户数量的日益增长,互联网逐渐成为一种大众化的信息交流场所。各大网站纷纷提供各种信息交流服务,来吸引广大网民对其网站进行访问。目前,最常见的网络交流平台就是论坛和博客。Web2.0技术的成熟和广泛应用,正在使互联网的内容提供者由网站的开发和运营方转变为广大的网络用户。而由于互联网的开放性和网络监管机制的发展水平滞后,网络上的内容呈现出良莠不齐的特点。如果不能及时进行舆论引导,正确的观点和事实的真相有可能被错误的信息埋没,给社会的和谐带来不良影响。要想对网络舆论进行正确引导,首先要对网络话题和网络事件进行监测和预警。该课题的研究目的就是要给网络舆情研究人员提供一个监测和预警的数据依据,提出一个可行性的方案。
     本文通过对国内外相关技术的研究,结合实验室研究人员的需求和北京市互联网宣传管理办公室的实际情况,设计了一个网络服务智能监测平台。这个平台把三个功能不同的子系统有效的组合在一起,形成了一个大型的监测平台。
     本文在对互联网用户的基本状况、网络服务的特征和现有网络监测软件进行深入分析研究的基础上,针对这个平台的设计原则和所要实现的功能,对多个方面的关键技术进行深入的研究,并结合系统本身的特点,对相关技术进行改进和优化,为系统各模块的具体设计打下基础。
     在完成上述准备工作的基础上,提出了网络服务智能监测平台的整体框架,并对框架中的各个模块从技术上和流程上进行说明。设计了网络服务发现模块,在这个模块中,提出了一种针对论坛和博客服务的发现方法,通过实际检验,证明这种方法的可靠性较高,可以作为此模块的核心方法。根据现有系统的优缺点,通过多种技术的综合,设计了网络信息采集模块。在循环监控算法的设计过程中,提出了对Google开发的PageRank(算法进行改进的方法,并把它作为监控算法的一部分。经过理论分析,证明这种改进是有效的。最后,论文完成了对平台的整个设计。
Recently, with the rapid development of network construction and relevant technologies, and the growth of number of Internet users, Internet is becoming a public communication place gradually. A lot of websites are interested to supply kinds of communication services in order to increase the users' visits. Nowadays, the two most popular platforms for network communication are forum and Blog. The maturity and widely application of web2.0 technology are changing the ISP from website founders and operators to vast users. While, because of the openness of Internet and the undeveloped monitoring to the websites, the information on the Internet is uneven. If we couldn't conduct the opinion, the correct opinions and the truth would be buried by the wrong one, making bad effect on the social harmony. The first job on correct conduction to the opinion is to supervise and early alert the network topics and incidents. The research goal of the issue is to provide other researchers some data for monitoring and an available scheme.
     Through the research on domestic and foreign technologies, the dissertation designs an Intelligent Monitoring Platform of Network Service (IMPoNS), according to the need of the researchers in the lab and the Beijing Internet Propaganda and Management Office. This platform combines three sub-systems with different function to a large scale monitoring platform.
     Based on deep analysis and research on the basic condition of Internet users, characteristic of network services and network monitoring software on hand, aiming at the principle and function of the platform, the dissertation does deep research on several technologies. Corresponding with some self-features of the system, improves and optimizes relevant technologies, which is the basis of the concrete design to every single model of the system.
     Based on above preparations, the dissertation proposes the general structure of IMPoNS, and gives some explanations from the technical and flow's angle. Design the model of network service discovery. In this model, the dissertation gives a method of forum and web log discovery. After actually testing, this method is highly reliable to be the core one. According to the advantages and disadvantages of the systems on hand, design information acquisition model integrated with several technologies. During the design of circulating monitoring algorithm, dissertation give a method which improve the PageRank algorithm of Google, and let it be part of the monitoring algorithm. After theoretical analysis, this improvement is effective. In the end, the dissertation finishes whole design of the platform.
     There are thirteen diagrams, seven tables and forty references in this dissertation.

引文

[1]中国互联网络信息中心.2007年中国博客市场调查报告[A].CNNIC.2007:9-12.
    [2]张允若.外国新闻事业史.第一版.湖北.武汉大学出版社.2000.
    [3]信息产业部.中国互联网络域名管理办法[A].电力信息化.2005:9-11.
    [4]张立等著.C#2.0完全自学手册.第一版.北京.机械工业出版社.2007.
    [5]B-power工作室.C#网络程序设计.第一版.北京.中国铁道出版社.2001:71-120.
    [6]H.M.Deitel,P.J.Deitel.康博.XML编程技术大全.第一版.北京.清华大学出版社.2002:93-99.
    [7]Natanya Pitts.徐晓梅,龚志翔,王晓云.XML技术内幕.第二版.北京.机械工业出版社.2002:403-423.
    [8]曲萍,赵志伟.数据挖掘的应用研究[A].信息科学.2006:71.
    [9]覃宝灵.聚类分析技术及其应用研究[J].广西工学院学报.2007,18(3):105-108.
    [10]邵峰晶,于忠清.数据挖掘原理与算法.第一版.北京.中国水利水电出版社.2003:197-224.
    [11]沃森,内格尔等著.C#入门经典.第一版.北京.清华大学出版社.2006.
    [12]张培颖,李村合.一种中文分词词典新机制-四字哈希机制[J].微型电脑应用.2006,22(10):35-36.
    [13]中国互联网络信息中心.中国互联网调查报告[A].CNNIC.2008:36-53.
    [14]李伟,黄颖.文本聚类算法的比较[J].Science Information Development and Economy.2006,16(22):234-235.
    [15]Heather Williamson.智慧东方工作室.XML技术大全.第一版.北京.机械工业出版社.2002:36-62.
    [16]Ann Navarro,Chuck White,Linda Burman.周生炳,宋浩,袁海洋,肖伟.XML从入门到精通.第一版.北京.电子工业出版社.2000:267-277.
    [17]王浩然.C#行家设计手册.第一版.北京.中国铁道出版社.2002:103-119.
    [18]Jeff Heaton.童兆丰,李纯,刘润杰.网络机器人Java编程指南.第一版.北京.电子工业出版社.2002:211-321.
    [19]国务院信息化工作办公室.2005年中国互联网络信息资源数量调查报告[A].CNNIC.2006:21-29.
    [20]张娜,张化祥.基于超链接和内容相关度的检索算法[J].计算机应用.2006,26(5):1171-1173.
    [21]苏晓珂.基于Nutch的主题爬虫研究与实现[D].昆明理工大学学位论文.2006:11-15.
    [22]薛建春.垂直搜索引擎中网络蜘蛛的设计与实现[D].中国地质大学学位论文.2007:12-20.
    [23]刘洁清.网站聚焦爬虫研究[D].江西财经大学学位论文.2006:17-20.
    [24]邱正国.主题蜘蛛的研究及实现[D].南京师范大学学位论文.2007:21-29.
    [25]叶勤勇.基于URL规则的聚焦爬虫及其应用[D].浙江大学学位论文.2007:38-40.
    [26]程菲,汪建海,罗键.增量更新Crawler进行Web收集方法研究[J].计算机工程与科学.2006,28(12):28-30.
    [27]苗长芬,冯伟华.面向主题Crawler的设计与实现[J].平原大学学报.2005,22(3):110-112.
    [28]David Hand,Heikki Mannila,Padhraic Smyth.张银奎,廖丽,宋俊.数据挖掘原理.第一版.北京.机械工业出版社.2003:186-191.
    [29]周立柱,林玲.聚焦爬虫技术研究综述[J].计算机应用.2005,25(9):1965-1969.
    [30]崔建群,何炎佯,吴黎兵.基于XML的Web数据挖掘关键技术的研究[J].计算机工程.2006,32(20):43-44,77.
    [31]吴庆涛,普杰信,崔林.基于BBS文本信息的数据挖掘[J].洛阳工学院学报.2002,23(2):55-58.
    [32]王照亮.基于XML的数据抽取的研究与应用[D].大连海事大学学位论文.2007:16-17.
    [33]杨宇航,赵铁军,郑德权,于浩.基于链接分析的重要Blog信息源发现[J].中文信息学报.2007,21(5):68-72.
    [34]蒋卫星,金瓯,张彬.Web搜索算法研究综述[J].计算机技术与发展.2007,17(4):179-181.
    [35]李庆虎,陈玉健,孙家广.一种中文分词词典新机制-二字哈希机制[J].中文信息学报.2003,17(4):13-18.
    [36]成静,董建新.论Windows系统环境下的网络监测[J].Science Information.2007.5:39.
    [37]康平波,王文杰.基于自动分类的网页机器人[J].计算机工程.2003,29(21):123-127.
    [38]杨斌,孟志清.一种文本分类数据挖掘的技术[J].湘潭大学自然科学学报.2001,23(4):34-37.
    [39]琚洁慧.中文搜索引擎中的PageRank算法及实现[J].计算机工程与设计.2007,28(7):1632-1635.
    [40]朱俊卿.搜索引擎Google研究[A].现代图书情报技术.2002(1):45-47.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700