免疫电子邮件系统关键技术和算法研究

英文题名：Study on Key Technology and Algorithm of Immune Email System
作者：郭江鸿
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：邮件系统 ; 垃圾邮件 ; 免疫系统 ; 免疫算法
英文关键词：mail system ; spam ; immune system ; immune algorithm
学位年度：2007
导师：吴良杰
学科代码：081203
学位授予单位：哈尔滨工程大学
论文提交日期：2007-05-15

摘要

随着Internet的迅速发展和普及，电子邮件以其方便、快捷、成本低等特点而成为人们生活中重要的通信手段之一。但随之而生的垃圾邮件，则占用了大量网络资源，浪费了网民的宝贵时间，造成了巨大的经济损失。因此，研究和设计一种能够高效过滤垃圾邮件的电子邮件系统，有着重大的现实意义。
     人工免疫系统是一门新兴的科研领域，它提取、发现和应用生物免疫系统的有效机制来解决工程和科学问题。本课题的研究目的就是要利用人工免疫系统机制设计一个对垃圾邮件具有免疫能力的电子邮件系统，为这一领域的应用探索一条有效的途径。
     论文在综合分析传统反垃圾邮件技术的原理、优势和不足的基础上，对生物免疫机制及其在计算机领域的应用和算法作了深入研究，提出了构建免疫电子邮件系统的新设想，该系统具有对外界环境动态自适应以及能够及时识别新特征邮件等优良特性，这恰好弥补了传统技术这方面的不足。
     论文的创新点在于提出并设计了一个免疫电子邮件系统模型，该模型针对邮件系统的特点定义了抗原抗体编码、设计和改进了亲和力计算方法，将诸多免疫机制应用到模型中来，为充分体现自学习、记忆、自适应、多样性等良好免疫特性，模型分别设计了有效的解决方案。其中对克隆选择算法进行重点改进，保证了抗体的多样性。
     论文最后针对免疫电子邮件模型进行了仿真试验，对比了算法改进前后的实验结果，改进后算法使系统召回率有了一定程度的提高，同时虚报率也有所下降，证明了算法改进的有效性以及免疫电子邮件系统模型的适用和合理性。
With the rapid development and popularity of Internet, e-mail becomes an important means of communication in people's lives because of its convenient, fast, low cost features. But spam emails use a lot of network resources, waste valuable time of Internet users, cause enormous economic losses. Therefore, study and design of a high-efficiency filter for a spam e-mail system are of great practical significance.
     Artificial Immune System is an emerging field of scientific research. It can extract discovery and apply an effective mechanism of a biological immune system to solve engineering and scientific problems. The purpose of the thesis is designing an email system which can be immune to spam emails by using artificial immune system mechanisms, and exploring an effective way for the application in this area.
     In this thesis biological immunological mechanisms, computer field applications and algorithms are in-depth studied on the basis of a comprehensive analysis of traditional anti-spam technique theories, strengths and weaknesses. A new immune e-mail system is proposed. The system have excellent characteristics such as dynamic adaptive to an external environment and timely identification new mail features, which is a remedy for traditional deficiency.
     The innovation in this thesis is that an e-mail immune system model is proposed and designed. The model definite antigen-antibody codes according to mail system characteristics, and design and improve affinity calculation method. Many immune mechanisms are applied to the model. In order to fully reflect good characteristics such as learning, memory, adaptive and immune diversity, effective solutions are designed in this model. Clone selection algorithm is improved to ensure the antibody diversity.
     Finally, in this thesis a simulation test is made for immune e-mail system model. The experimental results between before improving and after improving are compared. The improved algorithms make the system recall rate increased and false rate decreased. So it can proof the improved algorithm effectiveness and application and reasonability of the immune e-mail system model.

引文

[1] 中国互联网信息中心．中国互联网络发展状况统计报告．北京：中国互联网信息中心，2006
    [2] 反垃圾邮件中心．2006年第四次中国反垃圾邮件状况调查报告．北京：中国互联网协会，2006
    [3] 曹麒麟，张千里．垃圾邮件与反垃圾邮件技术．人民邮电出版社，2003：15-23页
    [4] Hofmeyr S A, Forrest S. Immunity by Design: An Artificial Immune System. Proc. of GECCO' 99, 1999: 1289-1296p
    [5] J. D. Farmer, N. H. Packarcl. The immune systems, adaptation, and. machine learning. Physica, 1986, 22: 187-204p
    [6] Y. Ishida, et al. Fully distributed diagnosis by PDP learning algorithm: towards immune network PDP model. Proc. of IJCNN' 90, San Diego, 1990
    [7] H. Bersini and F. J. Varela. Hints for adaptive problem solving gleaned from immune networks. Proceedings of the First Workshop on Parallel Problem Solving from Nature, Springer-Verlag, 1990
    [8] S. Forrest, A. S. Perelson, L. Allen, R. Cherukuri. Self-Nonself Discrimination in a Computer. Proceedings of IEEE Symposium on Research in Security and Privacy, Oakland, CA, 1994, 5: 202-212P
    [9] S. Forrest, S. Hofmeyr, A. Somaya. Computer immunology. Communications of the ACM. 1997, 40(10) :88-96P
    [10] A. Somayaji, S. Hofmeyr, S. Forrest. Principles of a Computer Immune System. Proceedings of New Security Paradigms Workshop, the Association for Computing Machinery, New York, NY, 1997 : 75-82P
    [11] R. Deaton, M Garzon, J. A. Rose, R. C. Murphy, SE Stevens, DR. Franceschetti. DNA Based Artificial Immune System for Self-Nonself Discrimination. Proceedings of the 1997 IEEE International Conference on Systems, Orlando, Florida, 1997, 8
    [12] D Dasgupta. Immune-based intrusion detection system:a general framework. Proceedings of the 22nd national information systems security conference (NISSC), Virginia, USA, 1999
    [13] Ayara, Timmis, de Lemos, de Castro, Duncan. Negative Selection: How to Generate Detectors. Proceedings of 1st International Conference on Artificial Immune Systems (ICARIS-2002), University of Kent at Canterbury, UK, 2002, 9
    [14] Kim, Bentley. Immune Memory in the Dynamic Clonal Selection Algorithm. 1st International Conference on Artificial Immune Systems (ICARIS-2002), University of Kent at Canterbury, UK, 2002, 9
    [15] Jungwon Kim and Peter Bentley. Immune Memory and Gene Library Evolution in the Dynamic Clonal Selection Algorithm. 2004:361-391P
    [16] Andrew Secker, Alex A Freitas, Jon Timmis. AISEC:an artificial immune system for E-mail classification. The 2003 Congress on Evolutionary Computation. Californi USA:IEEE Computer Society Press. 2003: 131-138P
    [17] Terri Oda, Tony White. Increasing the accuracy of a spam-detecting artificial immune system. The 2003 Congress on Evolutionary Computation. California USA: IEEE Computer Society Press, 2003: 390-396P
    [18] Terri Oda, Tony White. Developing an immunity to spam. Lecture Notes in Computer Science. Heidelberg Germany:Springer-Verlag GmbH, 2003: 231-242p
    [19] 邓根豪．基于AIS和Bayes网络的垃圾邮件过滤研究．长沙理工大学硕士学位论文．2005
    [20] 张成功．人工免疫系统原理及其在反垃圾邮件技术中的应用研究．电子科技大学硕士学位论文．2005
    [21] 张泽明，罗文坚，王煦法．一种基于人工免疫的多层垃圾邮件过滤算法．电子学报．2006，9：1616-1620页
    [22] 袁耀文，盛励．基于信息免疫技术的垃圾邮件过滤技术．计算机应用研究． 2006，5：126-127页，139页
    [23] 刘源．信息处理用现代汉语分词规范及自动分词方法．清华大学出版社，1994
    [24] 王晓龙，关毅．计算机自然语言处理．清华大学出版社，2005
    [25] 曹倩，丁艳，潘金贵．汉语自动分词研究及其在信息检索中的应用．计算机应用研究．2004，5：72-73页
    [26] 吴栋，膝育平．中文信息检索引擎中的分词与检索技术．计算机应用．2004，7(24)：128-131页
    [27] Yiming Yang and Jan Pedersen. A Comparative Study on Feature Selection in Text Categorization. Proceedings of the. 14th International Conference on Machine learning. Nashville: Morgan Kaufrnann, 1997: 412-420P
    [28] 文硕频，乔胜勇，陈彩云，李治国．基于决策树的不完全决策表的数据补充及规则提取．计算机应用．2003，(11)：26-27页
    [29] X. Carreras and L. Marquez. Boosting Trees for Anti-Spare Email Filtering. in Proceedings of Euro Conference Recent Advances in NLP (RANLP-2001), 2001, 9
    [30] 刘洋等．垃圾邮件的智能分析、过滤及Rou曲集讨论．第十二届中国计算机学会网络与数据通信学术会议，武汉．2002，12
    [31] H. Drucker, D. Wu, and V. N. Vapnik. Support Vector Machines for Spam Categorization. IEEE Transactions on Neural Networks, 1999, 20(5)
    [32] I. Androutsopoulos, G. Paliouras and E. Michelakis. Leaming to Filter Unsolicited Commercial E-Mail. Technical report 2004, 2
    [33] Paul Graham. A Plan for Spam. 2002, 8
    [34] Paul Graham. Better Bayesian Filtering. 2003, 1

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700