协作式垃圾邮件过滤系统的研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
自从互联网普及以来,电子邮件逐渐成为人们生活中便捷的通信手段之一。然而,随之产生的垃圾邮件像瘟疫一样蔓延,污染网络环境,占用大量传输、存储和计算资源,影响了网络的正常运行。业内人士分析:一旦垃圾邮件占到互联网总数据流量的三分之一以上,将会造成巨大的存储需求,甚至对信息安全系统的有效性构成威胁。如何有效地治理垃圾邮件问题是全世界共同面临的一道难题,也是目前互联网上急待解决的问题。虽然目前某些系统采用一些传统的技术过滤垃圾邮件,但这些技术都很多不足之处。所以,研究设计一种有效的垃圾邮件过滤系统具有十分重要的意义。
     论文针对当前垃圾邮件大量泛滥的现状,研究了国内外大量反垃圾邮件文献,综合分析了国内外各种流行的垃圾邮件过滤方法,尤其是对协作式反垃圾邮件方法进行了深入的研究。在比较和分析现有的协作式垃圾邮件过滤系统的基础上,提出了一种基于P2P-Chord网络的协作式反垃圾邮件系统模型。该系统由服务器网络和客户端两部分构成。系统的工作流程:首先设定系统的CopyRank阈值,通过统计协作式过滤P2P-Chord网络中各种垃圾邮的CopyRank值,如果统计出来的CopyRank值高于设定的阈值就判定为垃圾邮件,反之为正常邮件。为了防止垃圾邮件发送者通过改变邮件的内容的方式来逃避过滤器,论文在客户端和服务器端分别采用了Nilsimsa和Checksum指纹算法来生成指纹的方式来解决该问题。我们的客户端插件中集成了Bayesian过滤器,这样邮件用户就可以根据以往的邮件在本地过滤垃圾邮件而无需将邮件指纹发送到协作式过滤社区,这样大大降低了网络的通信开销。目前实现了原型系统—AntiSpam和Outlook 2003客户端插件AntiSpamClient,实验结果表明该系统有较好的垃圾邮件过滤性能。
Since the popularity of the Internet, e-mail has gradually become one convenient means of communication in people's lives. However, the resulting spam spread like a plague, pollutes network environment, takes up much of transmission, storage and computing resources, and affects the normal operation of the network. Inners analyse: once spam accounted for a third of the total flow of Internet data above, will cause enormous storage requirements, and even the effectiveness of information security systems is posed a threat. Today,how to effectively deal with spam issues facing the world is a difficult issue, also it is a currently on the Internet problem which is in urgent need to be solve.Although some systems use some traditional spam filtering technologies, but these technologies are a lot of deficiencies. Therefore, the research and design an effective spam filtering system is of great significance.
     This paper focuses on view of the current massive flood of spam status quo,and studies the large number of anti-spam literature at home and abroad, comprehensive analysis of various popular spam filtering method at home and abroad,especially for collaborative anti-spam method conducted in-depth research. On the basis of comparison and analysis of existing collaborative spam filtering system, a network based on P2P-Chord of collaborative anti-spam system model is presented. The system has two parts: server network and clients. The workflow of system: System CopyRank first be set threshold, through statistics of various spam of CopyRank values in collaborative filtering P2P-Chord network, if statistics of CopyRank value is higher than the threshold be set, the e-mail will be judged spam, contrary to the normal mail. In order to prevent spammers by changing the contents of the e-mail to avoid filters, this paper clients and server were used Nilsimsa and Checksum fingerprint algorithm to generatethe fingerprint to solve the problem. Our client plugin integrated Bayesianian filters, so mail users can filer spam in local, according e-mails in the past without fingerprintswill be sent to the collaborative filtering community, so that a network greatlyreduces the communication overhead. A prototype system-AntiSpam and Outlook 2003 client plugin-AntiSpamClient now are realized at present.The experimental results showthat the system has good spam filtering performance.
引文
[1]反垃圾邮件中心.2007年第一次中国反垃圾邮件状况调查报告[R].2007.http://www.anti-spam.cn
    [2]中国互联网协会反垃圾邮件规范[S].http://www.isc.org.cn/
    [3]胡磊.基于内容的垃圾邮件过滤技术的研究[D].昆明:昆明理工大学,2005:1-11.
    [4]艾瑞市场咨询[EB/OL].http://www.iresearchgroup.com.cn/
    [5]啸风.新浪新闻网[EB/OL].http://www.sina.com.cn
    [6]詹川.反垃圾邮件技术的研究[D].成都:电子科技大学,2005:28-43,44-68.
    [7]曹麒麟,张千里.垃圾邮件与反垃圾邮件技术[M].北京:人民邮电出版社,2003.
    [8]Dr.Neal Krawetz.Anti-Spam Solutions and Security[J/OL].http://www.securityfocus.com/infocus/1763
    [9]E.Allman,J.Callas,M.Delany,et al.DomainKeys Identified Mail(DKIM)[S].2006.http://tools.ietf.org/id/draft-allman-dkim-base-01.txt.
    [10]赵毅.基于域密钥认证的反垃圾邮件技术[J].2006(33):90-91.
    [11]王兴宇.简介反垃圾邮件新技术SendedD[J/OL].中国反垃圾邮件联盟,2005.http://www.5dmail.net/html/2005-5-23/2005523150945.htm
    [12]Mathew Nelson.FairUCE[J/OL].2004.http://www.alphaworks.ibm.com/tech/fairuce
    [13]网络安全焦点网站.反垃圾邮件技术解析[EB/OL].2005.http://www.xfocus.net
    [14]Nelson Minar,Marc Hedlund,Caly Shirky,et al.Peer-to-Peer:Hamessing the Power of Disruptive Technologies[J/OL].2001.http://www.oreilly.com/catalog/peertopeer/
    [15]Todd Sundsted.IBM[J/OL].2001.http://www.ibm.com/
    [16]Navaneeth Krishnan.The Jxta solution to P2P[J/OL].Java Word.com,2001.http://www.javaworld.com/javaworld/jw-10-2001/jw-1019-jxta.html?page=2
    [17]陈姝,方滨兴,周勇林.P2P技术的研究与应用[J].计算机工程与应用,2002(13):20-23.
    [18]乐光学,李仁发,赵嫦花.P2P计算技术的研究与应用[J].计算机工程与应用,2004(36):163-167.
    [19]Napster.Peer-to-Peer-Napster[EB/OL].2004.http://wiki.media-culture.org.au/index.php/Filesharing_and_P2P-Napster
    [20]Groove..Groove 和 Sharepoint[EB/OL].2007.http://www.agilelabs.cn/blogs/linkin/archive/2OO7/O4/O1/groove-sharepoint.aspx
    [21]罗杰文.P2P综述[EB/OL].http://www.intsci.ac.cn/users/luojw/paers/review.htm
    [22]Chonggang Wang,Bo Li.Peer-to-Peer oveday networks:A survey[J].Technical Repod Department of Computer Science,HKUST,2003.
    [23]Q.Lv,P.Cao,E.Cohen,K.Li,et al.Search and replication in unstructured peer-to-peer networks[C].In Proceedings of 16th ACM International Conference on Supercomputing (ICS'02),New York,USA,2002.
    [24]Gnutella.Gnutella Protocol Development[S].2002.http://rfc-gnutella.sourceforge.net/src/rfc-0_6-draft.html
    [25]Ion Stocia,Robert Morris,David Liber-Nowell,et al.Chord:A Scalable Peer-to-Peer Lookup Protocol for Internet Applications[J].Technical Report TR-819,MIT,March 2001.
    [26]Sylvia Ramasamy,Paul Francis,Mark Handley,et al.A Scalable Content-Addressable Network[J].San Diego,California,USA,2001.
    [27]Ben Y.Zhao,Ling Huang,Jeremy Stribling,et al.Tapstry:A Resilient Global-Scale Overlay for Service Deployment[J].IEEE Journal on Selected Areas in Communications,2003.
    [28]Antony Rowstron,Peter Druschel.Pastry:Scalable,decentralized object location and routing for large-scale peer-to-peer system[J].Lecture Notes in Computer Science,2001.
    [29]王文强.Holly-基于P2P的反垃圾邮件网络[D].西安:西安电子科技大学,2005:32-35
    [30]DCC.Distributed Checksum Clearinghouse[Z].http://www.rhyolite.com/anti-spam/dcc/
    [31]Razor.Vipul's Razor[Z].http://sourceforge.net/projects/razor/
    [32]Pyzor.Pyzor[EB/OL].2005.http://pyzor.sourceforge.net/
    [33]Peng Liu,Guangliang Chen,Liang Ye,et al.A Spare Filtering System Based on Dynamically Organized Grid[C].Submitted to the IFIP International Conference on Network and Parallel Computing(NPC 2004),March,2004.
    [34]Peng Liu,Yao Shi,Francis C.M.Lau,et al.Grid Demo Proposal:AntiSpamGrid[C].IEEE International Conference on Cluster Computing,Hong Kong,2004.
    [35]Guoqing Mo,Wei Zhao.Muti-Agent Interaction Based Collaborative P2P System for Fighting Spam[C].IEEE/WIC/ACM International Conference on Volume,2006:428-431.
    [36]David D.Lewis,Marc Ringuette.A Compasion of Two Learning Algorithms for Text Categorization[C].Proceedings of SDAIR-94,3rd Annual Symposium on Document Analysis and Information Retrieval,1994.
    [37]Sahami M.,Sumais S.,Heckermon D.,et al.A Bayesian Approach To Filtering Junk E-mail[C].Proceeding of AAAI-98 Workshop on Learning for Text Categorization,1998.
    [38]Ion Androutsopoulos,Georgios P,Vangelis K,et al.Learning to Filter Spare E-Mail:A Comparison of a Naive Bayesian and.a Memory-Based Approach[C].In Procedding of the 4European conference on Principles and Practice of Knowledge Discovery in Databases(PKDD-2000),Lyon,France,2000:1-13.
    [39]K.M.Schneider.A Comparison of Event Models for Naive Bayes Anti-Spam E-Mail Filtering[C].In Proceedings of the 10 Conference of the European Chapter of the Association for Computational Linguistics.Budapest,Hungary,2003:307-314.
    [40]David Maxwell Chickering,David Heckerman.M.Efficient approximations for the marginal likehood of Bayesian networks with hidden variables[J].Machine Learning,1996.
    [41]E.Damiani,S.De Capitani di Vimercati,S.Paraboschi,et al.An Open Digest-based Technique for Spam Detection[C].Universita di Milano-26013 Crema,Italy(2)DIGI-Universita di Bergamo-24044 Dalmine,Italy.
    [42]Jianshe Dong,Haixia Cao,Peng Liu,et al.Bayesian Chinese Spare Filter Based on Cross N-gram[C].Proceedings of the Sixth International Conference on Intelligent Systems Design and Applications(ISDA'06),2006 Vol03
    [43]MANBER,U.Finding similar files in a large file system[J].In Proceedings of Winter USENIX Conference,1994.
    [44]Feng Zhou,Li Zhuang,Ben Y.Zhao,Ling Huang,et al.Approximate object-location and spare filtering on peer-to-peer systems[J].In Proc.of the ACM/IFIP/USENIX International Middleware Conference,2003.
    [45](美)Douglas E.Comer,David L.Stevens著,张卫译.TCP/IP互联网技术(卷3):客户-服务器编程与应用(Windows套接字版)[M].北京:清华大学出版社,2004.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700