基于指纹分析的垃圾邮件过滤技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
电子邮件已经成为人们生活、工作不可缺少的工具,但同时垃圾邮件的肆意泛滥,又造成了极大的危害。因此,反垃圾邮件技术一直是国内外的研究热点。本文详细分析了垃圾邮件的相关特点,深入探讨了对其进行分析控制的方法。
     首先,对国内外当前的垃圾邮件过滤技术进行了详细分析,包括其具体的检测方法、效果以及优缺点。在研究分析的基础上发现,这些过滤技术虽然能够达到比较高的识别准确率,但大多数都是在垃圾邮件发送完成后才开始对其进行分析;黑名单和域名反向查询等能够在邮件传输过程中进行分析的技术,又很容易被垃圾邮件躲避。因此,本论文的主要工作在于寻找一种比较理想的分析技术,能够在垃圾邮件传输过程中,对其进行准确的识别。
     其次,由于垃圾邮件发送者通常伪造邮件头,造成某些字段信息被主流过滤技术忽视。通过大量的对比和分析发现,对于在一段时间内来自同一发送源的垃圾邮件,其邮件头的某些字段带有相同的特征。为了更好的描述这些特征,本文提出并实现了一种基于邮件头的指纹分析技术。该技术根据邮件头中5个关键字段,生成特定的指纹数据,并依据垃圾邮件指纹库进行比对,能够在邮件的传输过程中对大批量发送的垃圾邮件进行准确的分析和识别。另外为提高指纹提取和比对的效率,本文采用了MD5加密算法和二叉树结构进行设计和实现。
     最后,针对目前的过滤技术仅仅对垃圾邮件进行识别,缺少对垃圾邮件发送者行为的抑制措施,本文在深入分析TCP可靠传输的基础上,设计并实现了三种发送行为控制机制,包括:增加响应时延、丢弃数据报和混合机制。这些机制能够依据指纹分析的结果,在不同程度上对垃圾邮件剩余数据报的传递造成阻塞,控制发送方的传送效率,实现降低其吞吐量的目的。本文通过实验证明了本技术的可用性和有效性。目前,本文提出的指纹分析和发送行为控制技术,已经作为重要模块集成到自主研发的企业级垃圾信息综合举报系统中。
E-mail has become an indispensable tool to people living and working, but at the same time, spam has caused great harm, because of its overflow. Therefore, the anti-spam technology has always been hot topic in research domestic and oversea. This thesis makes a detailed analysis about the related characteristics of spam, and depth of its detection and control method.
     First of all, domestic and foreign current spam filtering technologies are analyzed in detail, including the process methods, effects and advantages and disadvantages. Based on the analysis of the study, found that, although these filtering technologies can achieve high identification accuracy, but most only start to identify spam when mails are completely received. Blacklist and reverse-domain which can be effective during mail transferring, could led spam escape easily. For this reason, the major work of the thesis is to search a more ideal filtering technology, in order to accurately identify spam when they are in transmission process.
     Secondly, as spam senders usually forge mail header, causing some field information of header ignored by mainstream filtering technologies. Through a lot of contrast and analysis, found that, spam, from the same source in a period of time, have same features in some fields in their headers. In order to describe these characteristics in a better way, this thesis presents and implements a fingerprint analysis technology, which based on mail header. According to 5 field in mail head, this technology generate its specific fingerprint, and process comparison on the basis of spam fingerprint database, carries accurate analysis and identification during the mail transmission. To improve the efficiency of the fingerprint extraction and comparison, this thesis adopts the MD5 encryption algorithm and binary tree to design and implement.
     Finally, as the current filtering techniques only process identification about spam, are lacks of inhibition measures to sender. Based on analysis about TCP reliable transmission, this thesis designs and implements three sender behavior control mechanism, including increase response delay, discard pocket and mixed mechanism. These mechanisms based on result from fingerprint analysis, block spam data transmission in different degree, control the senders’transmit efficiency, and achieve the purpose of reducing its throughput. This thesis through experiments proves that, this technology is usable and effective. At present, the fingerprint analysis and send behavior control technology, which this thesis discussed, have been integrated to enterprise-class garbage information comprehensive reporting system.
引文
[1]中国互联网络信息中心.《第27次中国互联网络发展状况统计报告》. 2011, 5-13
    [2]逸飞. Gmail之父:电子邮件不会消亡,新浪科技, 2009
    [3]中华人民共和国通信行业标准TY/T 1311-2004《防范互联网垃圾电子邮件技术要求》:中华人民共和国信息产业部, 2004, 1-1
    [4]中国互联网协会.《2010年第三季度中国反垃圾邮件状况调查报告》, 2010, 9-29
    [5]时红梅,高茂庭.垃圾邮件过滤技术及发展,计算机与数字工程, 2008(06): 128-129
    [6]何庆.基于链路层防火墙技术的垃圾邮件透明网关的设计与实现: [硕士学位论文],成都:电子科技大学, 2007, 28-37
    [7]《梭子鱼垃圾邮件防火墙技术白皮书》, http://www.barracudanetworks.com.cn
    [8]庄状城.反垃圾邮件技术及系统方案综述,软件导刊, 2009, 08(03): 147-148
    [9]万明成,耿技,程红蓉.基于颜色与角点特征的图像垃圾邮件识别算法,计算机工程, 2009, 35(15): 209-211
    [10]陈勇,李卓恒,瞿华.《反垃圾邮件完全手册》,清华大学出版社, 2006, 13-14
    [11]张秋余,张博,迟宁.自然语言语义理解在反垃圾邮件中的应用[J],计算机应用, 2006(06): 1315-1317
    [12]曹海霞.基于贝叶斯的分布式反垃圾邮件系统的研究与实现: [硕士学位论文],赣州:江西理工大学, 2007, 35-58
    [13]林丹宁.反垃圾邮件关键技术研究与实现: [硕士学位论文],杭州:浙江大学, 2007, 15-20
    [14]刘震.垃圾邮件过滤理论和关键技术研究: [硕士学位论文],成都:电子科技大学, 2007, 15-16
    [15] P. Resnick. RFC 2822, http://www.ietf.org/rfc/rfc2822.txt, April 2001
    [16] J. Klensin, RFC 2821, http://www.ietf.org/rfc/rfc2821.txt, April 2001
    [17] James F. Kurose, Keith W. Ross.《计算机网络》(陈鸣等译),机械工业出版社, 2006, 74
    [18]龙恒, MIME文档和几何文档的解码算法[J],计算机时代, 2010, 09(21): 48-51
    [19]中国互联网协会反垃圾邮件中心.什么是垃圾邮件, http://www.anti-spam.cn, 2006
    [20]孔维华,刘继承,陈娟.基于优化Na?ve Bayes的垃圾邮件过滤[J],计算机安全, 2009(01): 18-30
    [21] Jorqensen, Zach, Zhou Yan. A multiple instance learning strategy for combating good wordattacks on spam filter, Journal of Machine Learning Research, 2008: 1115-1146
    [22]黄诠.垃圾邮件过滤技术研究与发展,计算机知识与技术, 2008(16): 1218-1220
    [23] Muhammad N. Marsono, M. Watheq EI-Kharashi, Fayez Gebali. A spam rejection scheme during SMTP sessions based on layer-3 e-mail classification. Journal of Network and Computer Application. 2009: 83-84
    [24] Dong-Her Shih, Hsiu-Sen Chiang, David Yen C. Classification methods in the detection of new malicious emails, Information Sciences, 2005: 241-261
    [25]郑梅.基于规则的垃圾邮件过滤系统设计与实现: [硕士学位论文],成都:电子科技大学. 2008, 19-39
    [26] Rian Goetz. Stamp out spam with Spam Assassin , IBM developer Works Linux Zone, September 2002
    [27]王天佐,胡华平,刘波.反垃圾邮件技术研究,中国电子学会第十五届信息论学术年会暨第一届全国网络编码学术年会, 2008, 612-619
    [28]陈晋川,陈治璋,贾洪明.基于模式的贝叶斯垃圾邮件过滤的研究与实现,计算机应用, 2006(06): 172-175
    [29]曾小宁.一种新的垃圾邮件过滤技术的探究与实现[J],计算机应用与软件, 2009, 26(07): 98-101
    [30] Androutsopoulos, G.Paliourasand E.Michelakis, Learning to Filter Unsolicited Commercial E–Mail [EB], Technical report, 2004
    [31] LOWD D, MEEK C. Good word attacks on statistical spam filters [A], In proceedings of the second conference on email and anti-spam[C], Palo Alto, 2005
    [32]王祖辉,姜维.基于支持向量机的垃圾邮件过滤方法,计算机工程, 2009, 35(15): 188-189
    [33] Joachim Diederich, J?rg Kindermann, Edda Leopold. Authorship Attribution with Support Vector Machines[J], Applied Intelligence, 2003
    [34] Drucker, H.Wu. Support vector machines for spam categorization. IEEE Transactions on Neural networks 10(5): 1048-1054
    [35] Wang Ziqing, Sun Xia. An Efficient SVM- Based Spam Filtering Algorithm, Proceedings of the Fifth International Conference on Machine Learning and Cybernetics, Dalian, 13-16, August 2006, 3682-3686
    [36]任劼.基于神经网络的电子邮件分类与过滤[J].计算机工程与设计,2006,27(6): 1021-1024.
    [37]苏哲,基于linux透明网桥的反垃圾邮件系统设计与实现: [硕士学位论文],成都:电子科技大学, 2007, 31-32
    [38]张金良,李忠言.新一代netfilter底层开发结构,现代情报, 2005, 25(9): 43-45
    [39]刘雪武.蜜罐技术在网络安全中的应用,电脑知识与技术, 2010(9): 2093-2094
    [40] Weihong Liu, Weidong Fang, Adaptive Spam Filtering Based on Fingerprint Vectors, 2008 ISECS International Colloquium on Computing, Communication, Control, and Management, vol.1, 2008, 384-388
    [41]孙晶涛.基于LSA和MD5算法的垃圾邮件过滤系统研究: [硕士学位论文].兰州:兰州理工大学, 2008, 32-33
    [42]杨启华.基于SHA-1算法邮件过滤系统的研究与实现: [硕士学位论文].成都:电子科技大学, 2005, 72-74
    [43] Shuang Chen, Feng-li Zhang, Qiao Liu, Damping at anti-spam gateway, International Conference on Communications, Circuits and Systems(ICCCAS), 2010, 286-289
    [44] Kang Li, Calton Pu, Mustaque Ahamad. Resisting Spam Delivery by TCP Damping, in Proceedings of the First Conference on Email and Anti-Spam(CEAS 2004), 2004, 208-215

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700