面向流量分析的流模式匹配技术
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
网络流量识别在网络安全、网络监控以及负载均衡等方面起着重要的作用,已成为网络管理的基础和前提。但随着加密流量和各种新应用的不断涌现,网络流量识别也面临着巨大的挑战。目前的各种网络流量识别方法(基于端口、基于载荷以及统计特征的算法)都有着各自显著的优缺点,且均不能满足现今数据流分析的需要。因此需要设计一种简单、规范的机制,来融合各种识别方法的优点,使其协同工作,实现高效、准确的网络流量识别。
     本文提出一种流模式匹配技术,通过定义形式化的描述规范,使得用户可以按照一定语法和语义,用一个“流模式”来无歧义的表示一系列具有某些共同特征的数据流;并针对这种表示方法进行解析、处理,设计出基于非确定性有限自动机(NFA)和位并行算法的流模式匹配引擎,实时地从截获的网络数据中准确区分出指定的流。
     本论文主要完成了以下工作:
     1.给出流模式的定义以及规范化设计。该模式融合了现有的几种流量识别算法,并可以灵活方便的进行书写和扩展。
     2.根据流模式自身的特点,构建了适合流模式的专用的解析树构造器,将用户设定的流模式解析成便于程序处理的树型结构。
     3.在解析树的基础上,设计出基于非确定性有限自动机和位并行搜索算法的流模式匹配引擎。
     4.对流模式匹配引擎进行系统测试、验证,给出理论分析和功能测试。测试结果验证匹配引擎能有效地完成流量的识别。
Recognizing network traffic has always been the basis of network security and management due to its applications in network security, network monitoring, load balancing, etc. But it has faced enormous challenges due to the continued emergence of new applications and encrypted traffic. The currently available approaches such as port-based classification, payload-based classification and statistical-based classification, have respective advances and backwards, and none of them perform well for all different network data on the internet nowadays. Thus a kind of simple and standard mechanism which includes the advantages of different methods and provides a high level of recognition efficiency and accuracy should be proposed and designed.
     This thesis proposes a stream pattern matching technique. By defining a formal description specification, any series of data stream with common features can be unambiguously described by a special stream pattern, according to a certain grammar and lexeme. And then the stream pattern is parsed in order to obtain a tree representation; Above that, a stream pattern engine is constructed based on the Non-finite automata (NFA) and Bit-parallel searching algorithms.
     The main contributions in this thesis are as follows:
     1. Put forward a definition and specification design of stream pattern. This pattern combines different approaches at present, and can be flexibly written and expanded.
     2. Build a special stream pattern parser based on its own features, in order to obtain the tree structure which is easily processed.
     3. Construct a stream pattern engine based on the NFA-based FSM and Bit-parallel searching algorithms after the parse tree is built.
     4. Test and verify the performance of stream pattern engine, and give the analysis of the theory and the functional test. The functional test result shows the effectiveness of the stream pattern engine.
引文
[1]Kim M S, Kang H J, Hong J W. Towards Peer-to-Peer Traffic Analysis Using Flows [C]//Self Managing Distributed Systems,14h IFIP/IEEE Workshop on Distributed Systems:Operations and Management. Heidelberg, Germany,2003:55-67
    [2]Kang H J, Ju H T, Kim M S, et al.Towards Streaming Media Traffic Monitoring and Analysis [C]//Proceedings of 2002 Asia-Pacific Network Operations and Management Symp. Jeju, Korea:APNOMS,2002:503-504
    [3]LEIBOWITZ N, RIPEANU M, WIERZBICKI A. Deconstructing the Kazaa network [C]//Proceedings of 3rd IEEE Workshop on Internet Applications (WIAPP'03).Piscataway, NJ, USA:IEEE,2003:112-120
    [4]N. Leibowitz, A. Bergman, R. Ben-Shaul, et al. Are file swapping networks cacheable? Characterizing P2P traffic [C]//Proceedings of the 7th International Workshop on Web Content Caching and Distribution (WCW). Boulder,Co, 2002:121-134.
    [5]IANA. http://www.iana.org/assignments/port-numbers
    [6]Constantinou F, Mavrommatis P. Identifying Known and Unknown Peer-to-Peer Traffic [C]//Fifth IEEE International Symposium on Network Computing and Applications. Massachusetts, USA:Cambridge,2006:93-102
    [7]Kang H J, Kim M S, Hong J W-K.A Method on Multimedia Service Traffic Monitoring and Analysis [C]//In DSOM 2003. Lecture Notes in Computer Science 2867. Heidelberg, Germany,2003:93-105
    [8]Callado A, Kamienski C, Szabo G, et al.A Survey on Internet Traffic Identification [J]. IEEE Communications Surveys & Tutorials,2009, 11(3):37-52
    [9]赵瑞.基于特征串的p2p流量识别研究与实现[D].成都.电子科技大,2009
    [10]陈亮,龚俭,徐选.基于特征串的应用层协议识别[J].计算机工程与应用,2006,24(4):16-19
    [11]丁晶,陈晓岚,吴萍.基于正则表达式的深度包检测算法[J].计算机应用,2007,27(9):2184-2186
    [12]Zuev D, Moore A W. Traffic Classification Using a Statistical Approach[C]//PAM 2005. Boston, MA,2005:324-328
    [13]Moore A W, Zuev D, Crogan M. Discriminators for use in flow based classification[R].Department of Computer Science, Queen Mary, University of London, 2005
    [14]Moore A W, Zuev D. Internet traffic classification using Bayesian analysis techniques [C]//ACM SIGMETRICS. New York:ACM Press,2005:50-60
    [15]Zander, S.; Nguyen, T. and Armitage, G., Automated Traffic Classification and Application Identification Using Machine Learning [C]//Proc of the 30th IEEE Conference on Local Computer Networks Anniversary. Washington DC:IEEE Computer Society,2005:250-257
    [16]Bernaille, Laurent and Teixeira, Renata.Early Recognition of Encrypted Applications [C]//Proc of the 8th International Conference, Passive and Active Measurement Conference.Louvain-la-Neuve, Belgium,2007:165-175
    [17]Kim MS, Won YJ, Hong J W K. Application-Level Traffic Monitoring and an Analysis on IP Networks [J]. ETRI journal,2005,27(11):22-42
    [18]More A W, Papagiannaki K.Toward the Accurate Identification of Network Application[C].//PAM 2005. Boston, MA,2005:41-54
    [19]Szabo G, Szabo 1, Orincsay D. Accurate Traffic Classification [C]//IEEE 2007 International Symposium on a World of Wireless, Mobile and Multimedia Networks. Helsinki, Finland:Proceedings,2007:1-8
    [20]Karagiannis, T, Papagiannaki, K, Faloutsos, M. BLINC:Multilevel Traffic Classification in the Dark [C]//Proceedings of the ACM SIGCOMM 2005 Conference on Applications, Technologies, Architectures, and Protocols for Computer Communications.Philadelphia, Pennsylvania, USA,2005:229-240
    [21]Haffner P, Sen S, Spatscheck O, et al. AC AS:automated construction of application signatures[C]//ACM SIGCOMM Workshop on MineNet 2005. Philadelphia: ACM Press,2005:107-202
    [22]陈曙晖.基于内容分析的高速网络协议识别技术研究[D].长沙.国防科学技术大学,2009
    [23]San S, Spatscheck O, Wang D. Accurate, scalable in-network identification of P2P traffic using application signatures[C]//Proceeding of the 13th International World Wide Web Conference. NY:ACM Press,2004:512-521
    [24]Levandoski J, Sommer E, Strait M. Application Layer Packet Classifier for Linux[CP/OL]. http://17-filter.sourceforge.net/.2006.
    [25]Okabe T, Kitamura T, Shizuno T. Statistical traffic identification method based on flow-level behavior for fair VoIP service [C]//IEEE Workshop on VoIP Management and Security. Vancouver:IEEE Press,2006:35-40
    [26]Frank J. Machine Learning and Intrusion Detection:Current and Future Directions[C]//Proceedings of the National 17th Computer Security Conference. Baltimore, MD,1994:763-769
    [27]More A W, Zuev D. Internet traffic classification using Bayesian analysis techniques [C]//ACM SIGMETRICS. New York:ACM Press,2005:50-60
    [28]MITCHELL TM.机器学习[M].曾华军,张银奎,等译.北京:机械工业出版社,2003
    [29]Bernaille L, Teixeira R, Akodkenou I, et al. Traffic Classification On The Fly [J]. ACM SIGCOMM Computer Communication Review,2006,36(2):23-26
    [30]T.Hastie, R. Tibshirani, and J.Friedman.The Elements of Statistical Learning:Data Mining, Inference and Prediction [M]. New York:Springer,2001
    [31]Roughan M, Sen S, Spatscheck O, et al. Class-of-Service Mapping for QoS:A Statistical Signature-based Approach to IP Traffic Classification[C]//Proc.ACM. SIGCOMM IMC 2004.Taormina, Italy,2004:135-148
    [32]Witten H, Frank E. Data Mining [M].Morgan Kaufmann Publishers,2000
    [33]Cheeseman P, Stutz J. Bayesian Classification (Autoclass):Theory and Results [C]//Advances in Knowledge Discovery and Data Mining. USA:AAAI/MIT Press, 1996:153-180
    [34]McGregor A, Hall M, Lorier P, et al. Flow Clustering Using Machine Learning Techniques [C]//PAM 2004.Antibes Juan-lesPins, France,2004:205-214
    [35]Dempster A, Paird N, Rubin D. Maximum likelihood from incomeplete data via the EM algorithm [J]. Journal of the Royal Statistical Society,1977,39(1):1-38
    [36]Jain A K, Dubes R C. Algorithms for Clustering Data [M]. Prentice Hall, Englewood Cliffs, USA,1988
    [37]Erman J, Arlitt M, Mahanti A. Traffic Classification using Clustering Algorithms[C]//proc of SIGCOMM*06 Workshop on Mining Network Data (MineNet). Pisa, Italy,2006:11-15
    [38]Friedl J E F.精通正则表达式[M].余晟译.3.北京:电子工业出版社,2007
    [39]Sperberg-McQueen C M. Notes on finite state automata with counters[EB/OL]. http://www.w3.org/XML/2004/05/msm-cfa.html.2004
    [40]Kilpelainen P, Tuhkanen R. Regular Expressions with Numerical Occurrence Indicators-preliminary results [C]//Proceedings of the Eighth Symposium on Programming Languages and Software Tools.SPLST'03,Kuopio, Finland,2003:163-173
    [41]Kilpelainen P, Tuhkanen R. One-unambiguity of regular expressions with numeric occurrence indicators [J]. Inf. Comput,2007,205(6):890-916
    [42]Becchi M, Crowley P. Extending Finite Automata to Efficiently Match Perl-Compatible Regular Expressions [C]//Proceedings of the 2008 ACM Conference on Emerging Network Experiment and Technology.CoNEXT 2008, Madrid, Spain, 2008:25
    [43]Becchi, M., Crowley, P.A Hybrid Finite Automaton for Practical Deep Packet Inspection [C]//ACM CoNEXT 2007.New York, NY, USA,2007:1-12
    [44]Gelade W, Gyssens M, Martens W. Regular Expressions with Counting:Weak versus Strong Determinism [C]//Proceedings of Mathematical Foundations of Computer Science 2009,34th International Symposium. Novy Smokovec, High Tatras, Slovakia, 2009:369-381
    [45]Yun S, Lee K. Regular Expression Pattern Matching Supporting Constrained Repetitions [C]//Proceedings of Reconfigurable Computing:Architectures, Tools and Applications,5th International Workshop. Karlsruhe, Germany,2009:300-305
    [46]XML:http://www.w3.org/XML/
    [47]Hopcroft J E, Motwani R, Ullman J D. Introduction to Automata Theory, Languages, and Computation [M]. Addison Wesley,2001
    [48]Yu F, Chen Z, Diao Y, et al. Fast and memory-Efficient regular Expression Matching for Deep Packet Inspection [C]//Proceedings of the 2006 ACM/IEEE Symposium on Architecture for Networking and Communications Systems. San Jose, California, USA,2006:93-102
    [49]Gluskov V M.The abstract theory of automata [J]. Russian Mathematical Surveys, 1961,16:1-53.
    [50]Berry G, Sethi R.From regular expression to deterministic automata [J]. Theoretical Computer Science,1986,48(1):117-126
    [51]Bruggemann-Klein A. Regular expressions into finite automata [J]. Theoretical Computer Science,1993,120(2):197-213
    [52]Chang C H, Paige R.From regular expression to DFA's using NFA's [C]//Proceedings of the 3rd Annual Symposium on Combinatorial Pattern Matching, number 664 in Lecture Notes in Computer Science. Springer-Verlag,1992:90-110
    [53]Hromkovic J, Seibert S, Wilke T. Translating regular expressions into small epsilon-free nondeterministic finite automata [C]//Proceedings of the 14th Annual Symposium on Theoretical Aspects of Computer Science, number 1200 in Lecture Notes in Computer Science.Springer-Verlag,1997:55-66
    [54]Thompson K. Regular expression search algorithm [J].Communications of the ACM,1968,11:419-422
    [55]Navarro G, Raffinot M. Flexible Pattern Matching in Strings:Practical on-line search algorithms for texts and biological sequences [M].USA:Cambridge University Press,2002
    [56]LIBXML:http://www.xmlsoft.org/
    [57]Watson B W.A new regular grammar pattern matching algorithm [C]//Proceedings of the 4th Annual European Symposium, number 1136 in Lecture Notes in Computer Science. Springer-Verlag,1996:364-377
    [58]Navarro G, Raffinot M. Fast regular expression search [C]//Proceedings of the 3rd Workshop on Algorithm Engineering, number 1668 in Lecture Notes in Computer Science. Springer-Verlag,1999:199-213
    [59]Navarro G. Nr-grep:a fast and flexible pattern matching tool [J]. Software Practice and Experience (SPE),2001,31(13):1265-1312
    [60]Wu S, Manber U. Fast text searching allowing errors [J]. Communications of the ACM,1992,35(10):83-91
    [61]Navarro G, Raffinot M.Compact DFA representation for fast regular expression search [C]//Proceedings of the 5th Workshop on Algorithm Engineering, number 2141 in Lecture Notes in Computer Science,2001:1-12
    [62]Navarro G, Raffinot M.New Techniques for Regular Expression Searching [J]. Algorithmica,2004,41(2):89-116
    [63]Nicand C. On the Average Size of Glushkov's Automata[C]//Proceedings of Language and Automata Theory and Applications, Third International Conference.Tarragona, Spain, April 2-8,2009:626-637

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700