基于透明网桥的垃圾信息防火墙软件系统设计与实现

作者：李晓
论文级别：硕士
学科专业名称：信息安全
中文关键词：垃圾信息 ; iptables ; netfilter ; 防火墙
英文关键词：Spam ; iptables ; netfilter ; firewall
学位年度：2008
导师：秦志光
学科代码：081203
学位授予单位：电子科技大学
论文提交日期：2008-03-01

摘要

信息社会中,网络是一种主要的信息发布通道,但是海量的垃圾信息肆掠的占有有限的带宽。虽然当今过滤垃圾手段繁多,但现有解决方案大多依附于服务器或客户端软件,通用性和过滤效果都不甚理想,如何建立一个统一、高效的垃圾信息过滤架构是当今反垃圾信息技术的一个重要研究内容。
     本文主要的研究内容及创新有如下两个方面:
     (1)研发出一种可扩展的基于透明网桥的垃圾信息防火墙系统。该系统创新性的利用透明网桥技术、iptables、netfilter搭建一个实时过滤系统。将网络中相关垃圾信息进行扣留,然后模拟网络协议栈,完成IP层(分片处理)、TCP层(数据流重组)和应用层的协议分析还原,得到完整的信息,利用双缓冲队列保存信息,采用多线程调度机制并行调用过滤算法进行判断。根据判断结果对相关数据包进行转发或丢弃,从而达到实时过滤垃圾信息的目的。
     (2)提出了一个基于行为过滤的垃圾信息过滤算法。系统创新性的将支持向量机理论用于垃圾信息的行为过滤,并实现了规则匹配过滤和改进的bayes过滤算法。系统的可扩展性很好,用户可以根据具体需要定义系统过滤的协议类型,同时也能在引擎上扩展更优秀的过滤算法。
     本系统独立于服务器和客户端,既不运行在服务器上,也不作为客户端的一个插件,而是部署在服务器前端。由于采用透明网桥方式,所以不为服务器所知。在垃圾信息到达服务器之前,进行实时过滤。从而降低服务器负载,也保护了客户端。同时服务器的功能模块采用“热插拔”形式,通过web可以远程控制系统所在主机,修改其配置文件,达到增加和减少功能模块的目的,系统检测到配置文件改变后,自动更新。同时系统支持在线升级,当有新的规则发布时,监听进程会自动下载规则文件,并且自动更新系统规则,不用重启,立即生效。系统通过配置,当部署在邮件服务器前端时,作为垃圾邮件防火墙,当部署在短信服务器前端时,作为垃圾短信防火墙,当部署在web服务器前端时,作为垃圾网页防火墙。该系统首先在实验室通过了试用,然后在北京华为进行了技术审核和试用,收到了不错的反馈。
In the information society, network is the main way to publish information, but a large part of bandwidth of network is used to transfer garbage information.Though there are many solutions trying to resolve the problem, many are based on server or just plug-ins to the client, and don't work perfectly.How to establish an efficient and extensible framework to filter garbage information is a very important issue.
     The main content and innovation of the article can be concluded as two points:
     (1) The article describes a kind of extensible spam filtration firewall.The system uses netbirdge、iptables、netfilter to establish the real-time filter system.The system gets the IP packages, then completes IP and TCP protocol revivification to get the integrated information.Use double queue to store the information, also contain a thread pool to provide multithreads processing.When the judgment comes out, the information will be forwarded or discarded.
     (2) The system has the behavior-based filter using support vector machine, rules-based filter, improved bayes-based filter and is very extensible, new protocol and new algorithm can be added easily.
     The system is isolated from server and client,neither setted up in server nor to be a plug-in in client.It is in front of server,filtering the spam in real-time.So the spam can not arrive at server nor the client.The system can add or delete modules by changing the configure file over web.After the system finds the configure changed ,it’ll reload the config automatically.Also it supports automatical upgrading,when there are new rule files,it’ll download them and use the new ones immediately.The system can be configured as mail firewall before mail server,short message firewall before short message server and web pages firewall before web server.The system was used in labs, had passed the tests in Beijing huawei and got good respondence.

引文

[1]中国互联网协会.中国互联网协会反垃圾邮件规范. 2003.2
    [2]蔡健,黄国兴.基于数据挖掘方法的电于邮件过滤[J].微型电脑应用,Vol.7 No.8 2001
    [3] Douglas E.comer. Using TCP/IP to establish the network.Beijing:The electric press,2003
    [4] W.Richard Stevens. TCP/IP,Vol 1:protocol. Beijing machine press,2001
    [5]博嘉科技主编.Linux防火墙技术探秘.国防工业出版社,2002
    [6]谢希仁.计算机网络.北京:电子工业出版社,1999
    [7]周威成,马素霞,齐林海.一种基于机器学习的垃圾邮件智能过滤方法[J].现代电力,Vol.20 No.1 2003
    [8]蔡立军,施荣华.一种新的电子邮件过滤系统模型的设计[J].计算机工程,Vol.19 No.16 2003
    [9] Tongming Zhou.The usage of data excavate[M].The national defence press,2001-09
    [10] Qingling Chao,Qianli Zhang.The spam and antispam technologies[M].People post press,2003-02
    [11] Xiaoming Zhao,Shaoren Zhao.The analyse and design of mail filter [J].The journal of dongnan university,Vol.31 No.5 2001
    [12] David H.Crocker. RFC 822 - Standard For The Format Of Arpa Internet Text Messages.August 13, 1982
    [13]黄羽.基于智能体技术的入侵检测系统及相关技术研究[D].成都:电子科技大学,2003-3
    [14] Jianchun Jiang,Hengtai Ma,Deen Ren. NIDS research [M].The software journal,Vol.11 2000
    [15] China antispam union. http://anti-spam.org.cn/
    [16]张耀龙.行为识别技术在反垃圾邮件系统中的研究与应用[D].北京邮电大学,2006
    [17]艾玲梅.基于人工神经网络的智能信息处理[J].西安工业学院学报,Vol.21 No.3 2001
    [18] Martin T,Hagan Howard B,Demuth Mark H.Beale Neural Network Design[M].The machine press,2002
    [19] Nanyuan Liang. The automatical Dividing Chinese words .Chinese information paper,1990(2):29
    [20] Hongxiao Fei.The research of Chinese words dividing.Computer engineering and usage,2005
    [21] Yanhong Zhao. The algorithm of Chinese word dividing based on reverse dividing. The journal of the information college of zhongnan university ,2004
    [22] Dong Li. The usage of dividing words in Chinese software .Microsoft China.
    [23] Peng Fu,Groza.Application modeling description language for reconfigurable co-processor computing.1-4 May 2005, 739- 742
    [24] Wu Kangni. Implementation of NC code interpreter of open architecture NC system platform. (China Mechanical Engineering).Vol.17,no.2,pp.168-171.25 Jan.2006
    [25] Ahmed S, Mithun F.Word Stemming to Enhance Spam Filter[A]. Proceedings of the 1st Conference on Email and Anti -Spam (CEAS 2004) [C] .CA ,USA :Mountain View,2004
    [26] Mannie MD.Immunological Self-Nonself Discrimination[J].Immunologic Research, 1999,19(1):65—87
    [27] Forrest S, Javomik B, Smith R., et al. Using Genetic Algorithms to Explore Pattern Recognition in the Immune System. Evol.Comp. 1993,1(3):191-192
    [28] Secker A, Freitas AA, Timmis J. AISEC:an artificial immune system for e-mail classification[A].The 2003 Congress Of Evolutionary Computation[C],2003(3),131-138
    [29] Bryan Klimt, Yiming Yang.Introducing the Enron Corpus.First Conference on Email and Anti-Spam (CEAS),2004
    [30] I.Androutsopoulos, G.Paliouras, E.Michelakis.Learning to Filter Unsolicited Commercial E-Mail.Technical report 2004/2, NCSR "Demokritos", 2004
    [31] Andrew Mccallum, Kamal Nigam.A Comparison of Event Models for Naive Bayes Text Classification.AAAI-98 Workshop on Learning for Text Categorization, 1998
    [32] Xuemei Huang. The design security mail system.Net Security Technologies and Application, 2001
    [33] Xingpeng Li,Wei Wu.The spam filter based on content .The journal of shanghai technologies, 2005
    [34] Jiande Lu .A WMFS based on web. Computer engineering and usage, 2002
    [35] Jiaohuan Jiao.The design of email filter. MODERN COMPUTER, 2004
    [36] Zi Zhuo.The model for email filter based on content .Computer engineering,2000
    [37]甘勇,陈锬,朱贵良.基于语义分析的电子邮件过滤系统设计.微电子学与计算机,2002
    [38]谭立球,谷士文,费耀平.个人化电子邮件自动过滤系统的设计.计算机应用,2002
    [39] Xiuzhen Wang.Something about the security of email.Journal of ezhou college.2003
    [40]张铃,吴福朝,张钹等.多层前馈神经网络的学习和综合算法[J].软件学报,Vol.6 No.7 1995
    [41] Nuanwan Soonthornphisaj. Anti-Spam Filtering:A centroid-Based Classification Approch [M].2002
    [42] Ruijiang Li. Talk about the technologies of anti spam .The journal of xingjiang college,2003
    [43]钟义信.知识论核心问题:信息-知识-智能的统一理论.电子学报,2001
    [44]吕栋,李建华.基于隐性马尔可夫模型的网络日志审计技术的研究.信息安全与通信保密,2004
    [45] Hongtao Wang. The usage of data excavate in Nids.The software journal,2002
    [46] David Heckerman.A Tutorial on Learnig Bayesian Networks .Technical Report, MSR_TR_95_6 March 1995.
    [47] William S. Yerazunis.The Spam-Filtering Accuracy Plateau at 99.9% Accuracy and How to Get Past It. Presented at the 2004 MIT Spam Conference,January 18, 2004
    [48] Yeqi Bian, Xuegong Zhang.Pattern identify. The qinghuang press
    [49]王小平,曹立明.遗传算法――理论,应用与软件实现.西安交通大学出版社
    [50]贾兆红,倪志伟,赵鹏.用遗传算法挖掘范例库中的特征项权重的方法.计算机工程, Vol.29 No.14 Jan, 2003

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700