用户名: 密码: 验证码:
P2P内容监管中的关键技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
近年来,以P2P文件共享类应用和P2P流媒体类应用为代表的P2P网络应用迅速发展,然而与此同时,一些违规的网络资源也借助P2P类网络应用快速传播,并引发了诸多网络和社会问题。如何对P2P内容和信息进行有效监管已经成为目前P2P研究领域中一个亟待解决的关键问题。
     P2P内容监管行为包括三个关键步骤,即资源和节点信息的采集,管理目标的选择以及对违规资源传播的控制。资源和节点信息的采集是指依据监管目标,对目标P2P系统中的资源及其发布信息和节点信息等数据进行采集,当前采用的主动实现方式之一是爬虫。管理目标的选择过程是指根据内容管理的目标和范围,在资源和节点信息采集的基础上,分辨正常资源和违规资源的过程。违规资源传播控制是指通过技术和非技术手段,实现对选择的目标资源的传播进行管理的行为,当前的主要管理策略集中在降低索引准确性上,并通过爬虫系统在待管理P2P系统中发布错误的索引信息的方式实现。然而现有技术还存在以下问题:1)随着P2P技术的发展,一些原有的数据采集技术已经不能完成数据采集的任务,比如传统的基于端口的管理方法等;此外,在对基于如DHT等新型体系结构的P2P系统进行数据采集时,现有的采集策略中存在较明显缺陷,比如全面性不佳和效率低下等问题;2)仅依靠资源的发布信息作为判断资源是否违规的依据,忽略了资源的真实可用性和不同资源间关注程度的差异对监管效果带来的影响;3)目前的通过降低索引准确性的资源传播控制策略效果较差,多数伪造条目可以通过内容特征和节点特征进行判断。
     针对上述问题,本文通过分析P2P系统中资源的分布特征以及内容监管技术的研究现状,重点研究P2P资源发布信息的采集策略、资源可用性判别方法和内容传播、控制的原理和机制,论文的主要研究结果如下:
     第一,针对使用映射类型索引的P2P文件共享应用中的资源发布信息的采集,本文提出一种基于名称间家族相似性的名称采集策略。利用名称间部分相似的组织方式,通过使用已知名称中的未知部分作为下次迭代初始条件,以及控制预先设定的搜索词向量,该采集策略能够在很大程度上完成目标系统中的资源发布信息的快照。实验在一基于DHT体系结构的实际P2P系统中,以一搜索词为初始向量,搜索得到约1000万个发布信息,间接验证了该策略的可行性。
     第二,针对目前P2P内容监管过程中,仅通过名称判断内容情况的局限性,本文提出一种基于统计推断的内容可用性判别方式,用以通过样本的可用性情况分析整体的可用性水平。区别于传统通过比较内容与其发布名称是否相符,本文用与一个内容关联的不同含义的名称数量作为衡量其可用性水平的指标,显然关联名称数量越多,内容的可用性越差。进而使用统计推断方法判断该类内容总体的可用性水平。相比于传统的通过名称判断资源实际内容的做法,本文提出的内容可用性的判定方式,1)能够有效减少监管系统中错误目标的数量,2)能够在此基础上,实现在名称和可用性维度上的基于学习算法的监管目标选择。
     第三,针对目前违规资源传播管理策略的局限性,即仅通过改变可用内容占一次搜索中全部内容比例的局限性,本文基于信息论,将一次内容搜索过程描述成内容经过其发布信息,从信源向信宿传播的信道,并基于此信道模型给出了两种管理策略:1)即通过目前的添加版本和副本的策略,改变信源概率分布;2)通过改变内容和节点特征等信道特征,来影响正常用户在判断搜索内容是否可用时的决策。二者都以减小平均互信息量为最终目标,进而达到减小内容成功传播概率的目的。最后,实验在一个实际P2P系统中通过多元线性规划和方差分析等统计方法分析了影响用户决策过程的关键因素。通过该基于信息理论的分析,一方面为内容传播控制找到了理论依据,另一方面也扩展了现有的仅针对信源的管理策略。
In recent years, P2P network that represented by P2P file sharing applicationsand P2P streaming media applications had gained rapid development, but at the sametime, some irregularities network resources also spread rapidly with P2P networkapplications, and caused a lot of networking and social problem. How to monitor P2Pcontent and information effectively has become an urgent key issue in P2Presearchfield.
     The regulatory actions for P2P content includes three key steps, resources andnode information collection, the options of management objectives as well as thecontrol of the spread of illegal resource. The collection of resources and nodeinformation is to collect the resources of target P2P system and it’s publishinformation and node information that based on regulatory objectives, the main waythat currently used is reptiles. The process of management objective selection isdistinguished normal and irregularities resources that based on the objectives andscope of the content management and the collection of resource and node information.Illegal resource dissemination control is the behavior to manage the spread of theselected target resources through technical and non-technical means, the current majormanagement strategies focused on reducing the accuracy of the index, and achieve itby publishing the wrong index information in P2P systems through the crawler system.However, the existing technology also has the following problems:1) With thedevelopment of P2P technology, some of the original data collection technology hasbeen unable to complete the task of data collection, such as the traditional port-basedmanagement methods; Moreover, for others such as DHT based new P2P systemarchitecture for data collection, the existing collection strategies exist obvious defects,such as poor comprehensiveness and inefficient;2) Only rely on the release ofresources information as a basis for judgment whether the resource violation, ignorethat the resource availability and the difference that the degree of concern that isbetween different resources influence on regulatory effect;3) The accuracy ofcurrently adopted strategy by poisoning the index have less effect because the featuresof resources or nodes could taken by ordinary users to distinguish the useable and theunusable.
     To solve those problems, through the study of distribution of P2P resources andthe actuality of P2P censorship, this dissertation has focused on the strategy of P2Pinformation gathering, the validity of resource and the mechanism of resource spreadand the way to dominate the propagation.
     Firstly, to improve the completety of metadata gathering in DHT-based systems,a Family-Resemblance based metadata snapshot strategy is proposed. Through thepartly similarity between two metadata, the snapshot strategy could continuouslyiterate by taken the unknown part from any known metadata. In a real DHT basedsystem where the strategy was deployed, about10million metadata was acquired byonly1search term, which proves the Family-Resemblance based strategy indirectly.
     Secondly, to increase the granularity of target selecting in censorship, a statisticalinference based resource validity differentiation is proposed. The relation between aresource and relatively metadata could be changed into the relation between twometadata, which is much easier to solve. Thus, a standard wilcoxon test could be usedto tell whether a series of resources is valid or not from the view of number ofmetadata. With this inference,1) A huge numbers of invalid resource could beexcluded from censorship target;2) By expending the oberservation, learing algorithmcould be taken to solve the target selecting procedure.
     Thirdly, to break the current limition in propagation control by inserting invalidcopies or metadatas, an information theory channel model based mechanism isproposed. Though this channel model based mechanis, two obvious points of view tocontrol the resource spread are proposed:1) Currently adopted control strategy is theway to redistribute the information source; and2) A series of features of resourcesand nodes could highly affect the choice of ordinary users. Both are aim to decreaseI(X; Y) which is the factor to measure the effect of propagation control. At last, anmultivariable regression is taken to prove that the historical download times and thesize of a file are the key factors in P2P file sharing systems to affect users’ choices.Besides, this analysis based on information theory gives the theorical evidence ofcurrent stategy and proposes a new way to implement the propagation control.
引文
[1] Schollmeier R. A Definition of P2P Networking for the Classification of P2PArchitectures and Applications [C]. In Proceedings First International Conference onPeer-to-Peer Computing. p101-2,2002.
    [2]孙知信,宫婧.一种基于流特性描述的P2P流量模糊识别方法[J].计算机学报,2008,31(7).
    [3] Xu Ke, Zhang Ming, Ye Ming-jiang,等. Identify P2P traffic by inspecting data transferbehavior [J]. Computer Communications,33(10):1141-1150.
    [4] Lee. Hyunyong, Nakao, Akihiro. ISP-driven delay insertion for P2P traffic localization [J],IEICE Transactions on Communications,2013, E96-B(1):40-47.
    [5] Pouwelse J.A., Garbacki P., Epema D.H.J.&Sips H.J. The BitTorrent P2P File-sharingSystem: Measurement and analysis[C]. In IPTPS2005.
    [6]郜文彬.基于BitTorrent协议的P2P网络主动测量的研究与实现[D].北京:北京工业大学计算机学院.2008.
    [7] Yang Jia, Ma Hao, Song Wei-jia,等. Crawling the eDonkey network[C]. In Proceedings-Fifth International Conference on Grid and Cooperative Computing, GCC2006-Workshops, October21-23,2006, Hunan, China. NJ:Inst. of Elec. and Elec. Eng.Computer Society.2006,133-136.
    [8] Y. Kulbak and D. Bickson,―The eMule Protocol Specification,‖Technical ReportTR-2005-03, Hebrew Univ., Jan.2005.
    [9]张飙.文件共享P2P网络来源交换的研究.北京工业大学硕士论文[D].2010.
    [10] Frankel J, Pepper T. The Gnutella protocol specification v0.4[WEB/OL],http://ffc-gnutella.sourceforge.net/developer/stable/index.html,2000-07-30/2008-05-20.
    [11]黄道颖,张安琳,黄建华,等. P2P网络Gnutella0.6模型研究[J].计算机应用与软件,2008,25(6).
    [12] Maymounkov P, Mazieres D, et al. Kademlia: A peer-to-peer information system based onthe XOR metric [J], peer-to-peer systems, Springer-Verlag Berlin, Berlin, Germany,2002,Vol.2429,53-65.
    [13]潘家毅.基于测量的Kad网络的研究[D].北京交通大学硕士论文.2007.
    [14] Liang J, Kumar R, Ross KW. The KaZaA Overlay: A Measurement Study[C]. InProceedings of the19th IEEE Annual Computer Communications Workshop.2004.
    [15] J. Liang, R. Kumar, K. W. Ross. The KaZaA overlay: A Measurement Study [J].Computer Networks Journal,2005.
    [16] http://www.kugou.com/[EB/OL]
    [17] http://winny.info/[EB/OL]
    [18] http://sharedb.info/[EB/OL]
    [19] Xiaoqun Yuan, Hao Yin, Geyong Min, et al. Dynamic Resource Provision inMulti-Channel P2P Live Streaming Systems [C]. In Proceedings of Computer andInformation Technology,2010. Washington DC, USA, IEEE Computer Society Press,2010:1849-1855.
    [20]张奇. P2P网络监管中的节点发现与阻断技术研究[D].北京工业大学硕士学位论文.2011.
    [21]管磊. P2P网络监管中的网络视频节目信息发现技术研究[D].北京工业大学硕士论文.2010.
    [22] Chuan Wu, Baochun Li, Shugiao Zhao. Diagnosing Network-Wide P2P Live StreamingInefficiencies. INFOCOM,2009. Piscataway, NJ, USA, IEEE Press,2009:2731-2735.
    [23]2010-2013年4月下载工具软件行业排名Top10[EB/OL].http://www.199it.com/archives/
    [24] MacManus R. Trend watch: P2P traffic much bigger than Web traffic[EB/OL].2006,http://www.readwriteweb.com/archives/p2p_growth_trend_watch.php
    [25]鲁刚,张宏莉,叶麟. P2P流量识别[J].软件学报,201122(6).
    [26] Li G. Project JXTA: A Technology Overvie [EB/OL].http://www.jxta.org/project/www/docs/TechOverview.pdf,2001-04-25/2005-04-20
    [27] Matteo Varvello, Christophe Diot, Ernst Biersack, et al. A Walkable Kademlia Networkfor Virtual Worlds[C]. In IPTPS2009.
    [28] Ryu Duksan, Lee Dan, Baik Jongmoon. Designing an architecture of SNS platform byapplying a product line engineering Approach [C]. In Proceedings of11thInternationalConference on Computer and Information Science, ICIS,2012,559-564.
    [29] http://www.torproject.org/[EB/OL]
    [30] Yair Sovran, Alana Libonati, Jingyang Li. Pass it on: Social Networks Stymie Censors[C]. In IPTPS2008.
    [31] S. Nakamoto. Bitcoin: A Peer-to-Peer Electronic Cash System,2008.
    [32] FETSCHERIN M, SCHMID M. Comparing the Usage of Digital Rights ManagementSystems in the Music, Film, and Print Industry [C]. In Proceedings of the ACMConference on Electronic Commerce, USA: Pittsburgh2003:316-325.
    [33] Xiaosong Lou, Kai Hwang. Collusive Piracy Prevention in P2P Content DeliveryNetworks[C]. In Proceedings of IEEE Transactions on Computers.970-83,2009.
    [34] PRICHARD J, WATTERS P, SPIRANOVIC C. Internet subcultures and pathways to theuse of child pornography [J]. Computer Law and Security Review,2011,27(6),585-600.P2P sub culture
    [35] Iwata Tetsuya, Abe Takehito, Ueda Kiyoshi. Digital Right management for P2P contentcommerce [C]. In Proceedings of IEEE Transactions on Multimedia, v14, n6,1538-1545,2012.
    [36] Kalker T, Epema D, Hartel P, etal. Music2Share-Copyright Compliant Music Sharing inP2P systems[C]. In Proceedings of the IEEE,2004.
    [37] Mudhakar Srivatsa, Ling Liu. Vulnerabilities and security threats in structured overlaynetworks a quantitative analysis[C]. In Proceedings of20th Annual Computer SecurityApplications Conference, p252-61,2004
    [38] Luo, Weimin, Liu, Jingbo, Xu, Jialiang. An Analysis of Propagation and Capability toAttack of Active P2P Worms[C]. In Proceedings3rd IEEE International Conference onComputer Science and Information Technology, ICCSIT2010, v2, p506-509,2010.
    [39]詹恂.网络文化的主要特征研究[J].社会科学研究,2005.
    [40]王凯东.建设具有中国特色的社会主义网络文化[J].西安电子科技大学学报(社会科学版),2000.
    [41] CNNIC2013年中国互联网络发展状况统计报告[EB/OL].http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/hlwtjbg/201301/P020130801546406723463.pdf
    [42] CNNIC2013年中国网民搜索行为研究报告[EB/OL].http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/ssbg/201308/P020130828331153376173.pdf
    [43]张立.网络舆论传播中若干算法的研究.北京交通大学博士学位论文.2009.
    [44] COSTA L D. SZNAJD COMPLEX NETWORK [J]. International Journal of ModernPhysis C,2005,16(7):1001~1016.
    [45] CNNIC2013年中国网民信息安全状况研究报告[EB/OL]http://www.cnnic.net.cn/hlwfzyj/hlwxzbg/mtbg/201212/P020121227486012736156.pdf
    [46]罗杰文.Peer-to-Peer计算综述.中科院计算技术研究所.2005.
    [47] Ion Stoica, Robert Morris, David Karger, et al. Chord: A Scalable Peer-to-peer LookupService for Internet Applications[C]. In Proceeding of SIGCOMM’01, San Diego, CA,USA.,2001.
    [48] Sylvia Ratnasamy, Paul Francis, Mark Handley, et al. A Scalable Content-AddressableNetwork[C]. In Proceeding of SIGCOMM’01, San Diego, CA, USA.,2001:161~172.
    [49] B. Y. Zhao, L. Huang, J. Stribling. Tapestry: A Global scale Overlay for Rapid ServiceDeployment[C]. In Proceedings of IEEE Jrnl. on Selected Areas in Communications(J-SAC), Vol.22, No.1. Jan.,2004:41~53.
    [50] Antony Rowstron, Peter Druschel. Pastry: Scalable, Distributed Object Location andRouting for Large-Scale Peer-to-Peer Systems[C]. In Lecture Notes in Computer Science2001.
    [51] Haiyong Xie, Arvind Krishnamurthy, Avi Silberschatz, et al. P4P: ExplicitCommunications for Cooperative Control between P2P and Network Providers [C]. InIPTPS2007.
    [52] Carline Y, Me L, Gourhant Y. Evaluation of P4P based on real traffic measurement [C].In Proceedings of5thinternational Conference on Internet Monitoring and Protection,ICIMP2010,129-134,2010.
    [53]余彦峰. P2P网络监控与拓扑发现的关键技术研究[D].北京工业大学博士学位论文.2007.
    [54] Yoshida Masahir, Ohzahata Satoshi Nakao Akihiro et al. Controlling file distribution inWinny network through index poisoning[C]. In Proceedings of2009internationalconference on information networking,2009.
    [55] Yoshida Masahir, Ohzahata Satoshi Nakao Akihiro et al. Controlling File Distribution inthe Share Network Through Content Poisoning[C]. In Proceedings of the24th IEEEInternational Conference on Advanced Information Networking and Applications (AINA2010), p1004-11,2010.
    [56]张涵.集中式P2P网络的拓扑发现客户端测量模块的设计与实现[D].北京工业大学硕士论文.2009.
    [57] Laorden Carlos, Galan-Garcia Patxi, Santos Igor. Negobot: A conversational agent basedon game theory for the detection of paedophile behavior[C]. In Proceedings of Advancesin Intelligent systems and computing, v189, p261-270,2013.
    [58] Cruz, I.P., Aller, C.F., Garcia, S.S. et al. A careful design for a tool to detect childpornography in P2P networks [C]. In Proceedings of2010IEEE International Symposiumon Technology and Society (ISTAS2010), p227-33,2010
    [59] Latapy Matthieu1, Magnien Clémence1, Fournier Rapha l1. Quantifying PaedophileQueries in a Large P2P System[C]. In Proceedings of IEEE INFOCOM2011-IEEEConference on Computer Communications, p401-5,2011.
    [60] Ho WH, Watters PA. Statistical and structural approaches to filtering Internetpornography[C]. In Proceedings of the IEEE Conference on Systems, Man andCybernetics,4792-8,2004.
    [61] Latapy Matthieu1, Magnien Clémence1, Fournier Rapha l1. Quantifying paedophileactivity in a large P2P system[J]. Information Processing and Management,2012.
    [62] Aidouni F, Latapy M, Magnien C. Ten weeks in the life of an eDonkey server[C]. InProceedings of the2009IEEE international parallel and Distributed ProcessingSymposium,2009.
    [63] N. Anderson,―Peer-to-Peer Poisoners: A Tour of Media-Defender,‖Ars Technica,Sept.2007.
    [64] S. Androutsellis-Theotokis and D. Spinellis,―A Survey of Peer-toPeer ContentDistribution Technologies,‖ACM Computing Surveys,vol.36, pp.335-371,2004.
    [65] Lu Dong-xu, Li Hua, Wang jia. Research on controllability for BitTorrent protocol [J].Journal of Northeastern University, v32n1,265-269June2011.
    [66]刘祥涛,龚才春,刘悦等.Kad网络节点资源探测分析[J].中文信息学报.2010vol24,no6, p85-91.
    [67]毛军鹏. P2P网络特定信息的发现与监控技术研究[D].解放军信息工程大学.2008.
    [68]刘琼,徐鹏,杨海涛等.Peer-to-Peer文件共享系统的测量研究[J].软件学报.200617(10).2131-2140.
    [69]路志学,马皓,宋维佳等.p2p的可视化研究与实现[J].通信学报,2006,(27),241-245.
    [70] Lin Fuhong, Lin Fudong, Chen Changjia. ID Distribution in KAD2010[C]. InProceedings of2nd International Asia Conference on Informatics in Control, Automationand Robotics (CAR2010), p384-7,2010.
    [71] YU Jie, FANG Chengfang, XU Jia et al. ID Repetition in Kad[C]. In Proceedings of IEEEP2P2009. Piscataway: IEEE Computer Society,2009:111-120.
    [72] Cholez Thibault, Chrisment Isabelle et al. Monitoring and Controlling Content Access inKAD[C]. In Proceedings of IEEE International Conference on Communications, ICC2010.
    [73] Steiner, M, En-Najjary, T, Biersack, E.W. Exploiting KAD Possible Uses and Misuses[J].Computer Communication Review, v37, n5, p65-9.
    [74] Long Vu, Indranil Gupta, Jin Liang, et al. Understanding overlay characters of large-scalepeer-to-peer IPTV systems [J]. ACM Transactions on Multimedia Computing,Communications and Applications, v6, n4, November2010.
    [75] Long Vu, Indranil Gupta, Jin Liang, et al. Measurement of a Large-scale Overlay forMultimedia Streaming [C]. In Proceedings of4thInternational Conference onHetergeneous networking for Quality, Reliability, Security and Robustness andWorkshops, QSHINE’07,2007.
    [76] Sen S, Wang J. Analyzing Peer-to-Peer Traffic across Large Networks[C]. In Proceedingsof2ndACM SIGCOMM Workshop on Internet Measurement.2002.
    [77] Chen Hongwei, Hu Zhengbin, Ye Zhiwei. A new model for P2P traffic identificationbased on DPI and DFI [C]. In Proceedings of2009International Conference onInformation Engineering and Computer Science, ICIECS2009,2009.
    [78] William Stallings.操作系统[M].陈向群.北京:机械工业出版社,2010.
    [79] Mitchell T.机器学习[M].曾华军.北京:机械工业出版社,2003.
    [80] Richard O. Duda, Peter E. Hart, David G. Stork.模式分类[M].李宏东.北京:机械工业出版社,2003.
    [81] Sandhya Samarasinghe.神经元网络在应用科学和工程中的应用——从基本原理到复杂模式的模式识别[M].史小霞,陈一民,李军治.北京:机械工业出版社.2010.
    [82] Nello Cristianini, John Shawe-Taylor.支持向量机导论[M].李国正,王猛,曾国华.北京:电子工业出版社.
    [83] Tom Auld, Andrew W. Moore, Stephen F. Gull. Bayesian Neural Networks for InternetTraffic Classification [J]. IEEE Transaction on Neural Networks,2007,18(1):233-230.
    [84] Jian Liang, Naoumov Naoumov et al. The Index Poisoning Attack in P2P File SharingSystems[C]. In Proceedings of25th IEEE INFOCOM Conference,2006p1737-48.
    [85] Locher Thomas, Mysicka, David et al. Poisoning the Kad Network[C]. In Proceedings11th International Conference, ICDCN2010, p195-206,2010.
    [86] John R. Douceur. The Sybil Attack[J/OL]. Microsoft Research.
    [87] Jetter Oliver, Dinger Jochen, Hartenstein Hannes. Quantitative analysis of the sybil attackand effective sybil resistance in peer-to-peer systems[C]. In Proceedings IEEEInternational Conference on Communications, May23-27,2010, Cape Town, Southafrica. NJ: Institute of Electrical and Electronics Engineers Inc.,2010.
    [88] Steiner M, En-Najjary T, Biersack E.W. Long Term Study of Peer Behavior in the KADDHT[C]. In Proceedings of IEEE/ACM Transactions on Networking2009, v17, n5, p1371-84.
    [89] Steiner M, En-Najjary T, Biersack E.W. Actively Monitoring Peers In KAD[C]. InProceedings2007.
    [90]尚新,李刚.范畴化理论的误区[J].徐州师范大学学报(哲学社会科学版)200430(6)65-68.
    [91] Luiwig Witgensteine.哲学研究[M].李步楼.北京:商务印书馆.1996.
    [92] Ajith Abraham, Aboul Ella Hassanien, Vaclav Snasel. Computational Social NetworkAnalysis: Trends, Tools and Research Advances [M]. Springer.2009
    [93]张环.复杂网络中的信息传播[D].华东师范大学硕士学位论文.2006.
    [94]武晓雁.复杂网络中的病毒扩散[D].华东师范大学硕士学位论文.2007.
    [95] Petrovic S, Brown P. a new statistical approach to estimate global file population fromlocal observation in edonkey[C]. In Proceedings of200921st International TeletrafficCongress (ITC21), p8,2009
    [96]胡桂华.捕获再捕获模型的一个应用[J].统计与决策,2006,9,138-140.
    [97] Christin N, Andreas W, Chuang J. Content Availability, Pollution and Poisoning in FileSharing Peer-to-Peer Networks[C]. In Proceedings of EC2005. New York: ACM,2005:68–77.
    [98] Yu Jiadi, Li Minglu, Hong Feng, et al. Free-riding analysis of Bittorrent-like peer-to-peernetworks [C]. In Proceedings of2006IEEE Asia-Pacific Conference on ServicesCompution, APSCC,534-538,2006.
    [99] D. Jia, W. G. Yee, O. Frieder, Spam Characterization and Detection in Peer-to-PeerFile-Sharing Systems[C]. In Proc. ACM Conf. on Inf. and Knowl. Mgt.(CIKM),2008.
    [100] J. Liang, R. Kumar, Y. Xi et al. Pollution in P2P file sharing systems[C]. Proceedings ofINFOCOM IEEE2005.
    [101] A. McCallum, K. Nigam. A comparison of event models for na ve bayes textclassification[C]. In Proceedings of the international AAAI-98Workshop on Learning forText Categorization. USA: AAAI Press,1998,41-48.
    [102] William Mendenhall, Terry Sincich.统计学[M].梁冯珍.北京:机械工业出版社,2009.
    [103]薛毅,陈丽萍.统计建模和R软件[M].北京:清华大学出版社,2007.
    [104] Handurukande S.B., Kermarrec A.M., Le.Fessant F., et al. Peer sharing behaviour in theeDonkey network, and implications for the design of server-less file sharing systems[J].Operating Systems Review, v40, n4, p359-71,2006.
    [105] Konstantin Pussep, Christof Leng, Sebastian Kaune. Modeling and Tools for NetworkSimulation: Modeling User Behavior in P2P systems[J]. Klaus Wehrle, Mesut Gunes,James Grob, p447-461, Springer,2010.
    [106] Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals.Soviet Physics Doklady10:707–10.1966
    [107] Chistopher D. Manning, Prabhakar Raghavan, Hinrich Schutze.信息检索导论[M].王斌.北京:人民邮电出版社,2010.
    [108]庖丁中文分词[EB/OL]. http://code.google.com/p/paoding/
    [109] Chih-Hao Tsai. Mmseg4j [EB/OL]. http://code.google.com/p/mmseg4j/
    [110]中科天玑. ICTCLAS汉语分词系统[EB/OL]. http://ictclas.org/
    [111] Chinh-Chung Chang, Chih-Jen Lin. LIBSVM-A Library for Support Vector Machines[EB/OL]. http://www.csie.ntu.edu.tw/~cjlin/libsvm/
    [112]赵旭.广义Pareto分布的统计推断.北京工业大学博士学位论文.2012.
    [113]李海芬,茆诗松.Pareto分布的检验[J].徐州师范大学学报(自然科学版)200422(3):12-16.
    [114]李海芬.Pareto分布的统计分析[D].华东师范大学硕士论文.2004.
    [115]樊彦朝.基于经验似然的单边假设检验[D].东北师范大学硕士论文.2007.
    [116] M. Kitsak, L. K. Gallos, S. Havlin, F.Liljeros, L. Muchnik, H. E. Stanley, H. A. Makes,Identification of influentialspreaders in complex networks, Nature Physics6(2010)888-893.
    [117]薛毅.数学建模基础(第二版)[M].北京:科学出版社,2011,18-25.
    [118]薛毅,常金刚,程维虎.数学建模基础[M].北京:北京工业大学出版社,2006.28-32.
    [119]范国兵.一种估计Logistic模型参数的方法及应用实例[J].经济数学2010,27(1):105-110.
    [120]殷炸云. Logistic曲线拟合方法研究[J].数理统计与管理2002,21(1):41-46.
    [121] Wang Peng, Tyra James, Eric Chan-Tin. Attacking the Kad Network[C]. In Proceedings ofthe4th International Conference on Security and Privacy in Communication Networks,SecureComm'08,2008, Proceedings of the4th International Conference on Security andPrivacy in Communication Networks, SecureComm'08.
    [122]周萌清.信息理论基础[M].北京:北京航空航天大学出版社,2002.
    [123]戴元光.传播学研究理论与方法[M].上海:复旦大学出版社,2008.
    [124]王斌会.多元统计分析及R语言建模[M].广州:广州暨南大学出版社,2011.108-118
    [125] Apache Tika-a content analysis toolkit[EB/OL] http://tika.apache.org/

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700