基于行为分析的垃圾邮件过滤技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
电子邮件自从产生以来,就给人们的工作生活带来了极大的方便,然而随之而来的垃圾邮件问题也越来越严重。垃圾邮件不仅占用大量的网络资源,同时也严重危害着人们的利益,给人们的生产生活带来了各种各样的不便。如何能够迅速、高效、准确的识别垃圾邮件也成为学者们研究的重点。本文对现有的垃圾邮件过滤技术进行了分析和总结,指出了传统的垃圾邮件过滤技术在处理效率、资源利用率以及抗干扰性等方面的不足,并在此基础上,根据垃圾邮件发送的行为特征,提出了基于用户关系挖掘和信誉评价的垃圾邮件行为分析方法以及基于邮件传输路径挖掘的垃圾邮件行为分析方法。
     基于用户关系挖掘和信誉评价的垃圾邮件行为分析方法根据受控网内用户通信关系建立用户关系模型,计算邮件指纹,然后依据用户关系模型挖掘特定路径集,最后通过该路径集上用户的历史评价建立邮件判定记录,识别邮件属性。
     基于邮件传输路径挖掘的垃圾邮件行为分析方法根据邮件头部Received字段的信息建立邮件服务器地址的拓扑结构,并综合考虑一段时间内发送邮件的数量对邮件服务器信誉的影响来计算邮件服务器信誉,并根据邮件传输路径上各邮件服务器的信誉来识别邮件属性。
     通过实验分析,基于用户关系挖掘和信誉评价的垃圾邮件行为分析方法具有很高的准确率、召回率,并能有效抵御恶意用户的干扰。基于邮件传输路径挖掘的垃圾邮件行为分析方法可以在一定程度上过滤垃圾邮件,作为垃圾邮件检测的一种辅助手段。
Email as a communication tool has a wide range of application. Although email facilitates our communication and cooperation, but the spam problem is more and more serious. How to use effective way to against spam email is always the focus of research.
     Through learning from related research at home and abroad, we point the disadvantage of the normal method on email processing efficiency, resource utilization and stability. Then the paper proposed a spam behavior analysis method based on user relation mining and reputation report and a method based on email transmission path mining.
     The method based on user relation mining and reputation report established a user relationship model according to user communication in controlled network, and then mined specific path set based on the user relationship model. Further, the method judged mail evaluation according to judge record which was established by user historic report on the path set. When receiver reported a mail, the method updated finger and user reputation based on judge model and receiver report.
     The method based on email transmission path mining built email server address topology according to received information in email head and considered the count of the email in a period of time to count the reputation of the email server and judge the email.
     The analysis and experiment results show that the method based on user relation mining and reputation report is effective to identify spam and robust to malicious users and the method based on email transmission path mining can identify spam to a certain extent.
引文
[1]互联网协会公布去年四季度反垃圾邮件调查结果.http://net.chinabyte.com/0/8798500.shtml,2009
    [2] History of email. http://www.olografix.org/gubi/estate/libri/wizards/email.html, 2009
    [3]中国互联网协会反垃圾邮件规范.http://www.isc.org.cn/20020417/ca134119.htm,2009
    [4]张耀龙.行为识别技术在反垃圾邮件系统中的研究与应用.北京邮电大学硕士论文.2006:9-10页
    [5] DCC: Distributed Checksum Clearinghouses. http://www.dcc-servers.net/dcc/, 2009
    [6] Razor. Vipul’s Razor: Reference Description. http://razor.sourceforge.net/, 2009
    [7] Prakash V, O’Donnell A. A reputation-based approach for efficient filtration of spam. http://www.cloudmark.com/releases/docs/wp_reputation_filtration_10640406.pdf, 2009
    [8]唐敏.垃圾邮件过滤技术研究.西华大学硕士论文.2006:13-14页
    [9] W. Richard Stevens著.范建华等译.TCP/IP详解卷1:协议.机械协议出版社,2000:332-333页
    [10] RFC 822 - STANDARD FOR THE FORMAT OF ARPA INTERNET TEXT MESSAGES. http://www.faqs.org/rfcs/rfc822.html, 2009
    [11] MIME参考手册.http://www.w3school.com.cn/media/media_mimeref.asp,2009
    [12] RFC 1939 - Post Office Protocol– Version 3. http://www.ietf.org/rfc/rfc1939.txt, 2009
    [13]唐燕.POP3协议解析及简单实现.网络通信与安全.2007,3(16):951-952页
    [14]倪云竹,吕光宏,蒲宇,冯纹.基于IMAP4电子邮件客户软件的设计与实现.计算机应用.2003,23(11):76-77页
    [15]陈勇,李卓桓.反垃圾邮件完全手册.清华大学出版社,2006:85-87页
    [16] Anirudh Ramachandran, Nick Feamster, Santosh Vempala. Filtering spam with behavioral blacklisting. In Proceedings of the 14th ACM conference on computer and communications security, 2007:342-351P
    [17]陈勇,李卓桓.反垃圾邮件完全手册.清华大学出版社,2006:125-132页
    [18]可追查性检查.http://book.51cto.com/art/200905/124507.htm,2009
    [19]张冰洋.基于行为的反垃圾邮件技术的研究与应用.哈尔滨工程大学硕士论文.2008:7-8页
    [20]詹川.反垃圾邮件技术的研究.电子科技大学博士论文.2005:34-36页
    [21] M Sahami, S Dumais, D Heckerman, et al. A Bayesian Approach to Filtering Junk E-mail. Learning for Text Categorization: Papers from AAAI Workshop. Madison Wisconsin, 1998:55-62P
    [22]李淑静.基于内容的垃圾邮件过滤研究与实现.南京信息工程大学硕士论文.2006:39-42页
    [23] I.Androutsopoulos, G.Paliouras, V.Karkaletsis, et al. Learning to Filter Spam E2Mail: A Comparison of a Naive Bayesian and a Memory2Based Approach. In Proc 4th European Conference on Principles and Practice of Knowledge Discovery in Databases(PKDD 2000), 2000:1–13P
    [24]任劼,项婧.基于神经网络的电子邮件分类与过滤.计算机工程与设计.2007,27(6):1021-1024页
    [25]王波,黄明迪.遗传神经网络在邮件系统中的应用.电子科技大学学报.2005,34(4):505-508页
    [26] Gomes L, Cazita C, Almeida J. Charactering a Spam Traffic. Taormina, Sicily, Italy, 2004(04):25-27P
    [27]第三代防垃圾邮件技术“行为识别”诞生.http://www.5dmail.net/html/2005-8-12/2005812133303.htm,2009
    [28] Leiba B, Ossher J, Rajan V, et al. SMTP path analysis. Proc of the Second Conf on E-mail and Anti-Spam. 2005
    [29]张尼,姜誉,方滨兴.基于邮件路径地理属性分析的垃圾邮件过滤算法.通信学报.2007,28(12):90-95页
    [30] Taylor B. Sender Reputation in a Large Webmail Service. Proc of the Third Conf on E-mail and Anti-Spam. 2006
    [31] Zheleva E, Kolcz A, Getoor L. Trusting spam reporters: A reporter-based reputation system for email filtering. ACM Trans on Information System(TOIS), 2008, 27(1):65-92P
    [32]章璿.基于数据挖掘的垃圾邮件行为识别关键技术研究.北京邮件大学硕士论文.2007:19-22页
    [33]陈琪.基于行为解析的反垃圾邮件系统的设计与实现.东北大学硕士论文.2007:19-26页
    [34]赵治国,谭敏生,丁琳.基于P2P协作的垃圾邮件发送行为识别技术研究.计算机工程.2008, 44(2):80-84页
    [35] Ramachandran A, Feamster N. Understanding the network—level behavior of spammers. ACM SIGCOMM Communication Review, 2006, 36(4):291-302P
    [36] GOLBEEK J, HENDLER J. Reputation network analysis for E-mail filtering. Proc of the Conference on E-mail and Anti-Spam(CEAS2004), 2004
    [37] Border A, Glassman S, Manasse M, et al. Syntactic clustering of the web. Proc of the 6th Int World Wide Web Conf. Essex: Elsevier Science B.V, 1997
    [38] Chowdhury A, Frieder O, Grossman D, et al. Collection Statistics for Fast Duplicate Document Detection. ACM Trans on Information System, 2002, 20(2): 171-191P
    [39] Yen J. Finding the K Shortest Loopless Paths in a Network. Management Science, 1971, 17(11): 712-716P
    [40]蓝炳伟.SPF技术在邮件服务系统中的应用.计算机系统应用.2006,10:42-43页
    [41]梁雪松.电子邮件认证技术研究.通信技术.2008,11(41):166-168页
    [42] Goodman J. IP Addresses in Email Clients. Proc of the Conference on E-mail and Anti-Spam(CEAS2004), 2004

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700