摘要
针对基于确定有限状态自动机的匹配引擎在大规模、复杂规则下会出现状态爆炸的问题,提出正则表达式子串抽取算法。通过将子串抽取算法应用于DFA状态爆炸场景,设计基于子串抽取的正则匹配引擎。实验结果表明,该算法在单个规则上运行时间可达10 ms量级,抽取率高达99%,同时匹配引擎具有较好的稳定性和可拓展性,且匹配速度优于相关开源匹配引擎。
Aiming at the problem that the Deterministic Finite Automation(DFA)-based matching engine will explode under large-scale and complex rules,a regular expression substring extraction algorithm is proposed.A substring extraction based regular matching engine is designed by applying the substring extraction algorithm to the DFA state explosion scenario.Experimental results show that the algorithm can run on a single rule up to 10 ms,and the decimation rate is as high as 99%.At the same time,the matching engine has better stability and scalability,and the matching speed is better than the related open source matching engine.
引文
[1] 褚衍杰,李云照,魏强.一种改进的多模式匹配算法[J].西安电子科技大学学报(自然科学版),2014(6):174-180.
[2] 赵毅,朱鹏,迟学斌,等.浅析高性能计算应用的需求与发展[J].计算机研究与发展,2007,44(10):1640-1646.
[3] XU C,CHEN S,SU J,et al.A survey on regular expression matching for deep packet inspection:applications,algorithms,and hardware platforms[J].IEEE Communications Surveys and Tutorials,2016,18(4):2991-3029.
[4] 刘燕兵.高性能正则表达式匹配技术研究[D].北京:中国科学院大学,2012.
[5] BECCHI M,CADAMBI S.Memory-efficient regular expression search using state merging[C]//Proceedings of the 26th IEEE International Conference on Computer Communications.Washington D.C.,USA:IEEE Press,2007:1064-1072.
[6] FICARA D,GIORDANO S,PROCISSI G,et al.An improved DFA for fast regular expression matching[J].ACM SIGCOMM Computer Communication Review,2008,38(5):29-40.
[7] FICARA D,PIETRO A D,GIORDANO S,et al.Differential encoding of DFAs for fast regular expression matching[J].IEEE/ACM Transactions on Networking,2011,19(3):683-694.
[8] KONG S,SMITH R,ESTAN C.Efficient signature matching with multiple alphabet compression tables[C]//Proceedings of the 4th International Conference on Security and Privacy in Communication Networks.New York,USA:ACM Press,2008.
[9] LIU Y,GUO L,LIU P,et al.Compressing regular expressions’ DFA table by matrix decomposition[C]//Proceedings of the 15th International Conference on Implementation and Application of Automata.Berlin,Germany:Springer,2010:282-289.
[10] SMITH R,ESTAN C,JHA S.XFA:faster signature matching with extended automata[C]//Proceedings of IEEE Symposium on Security and Privacy.Washington D.C.,USA:IEEE Computer Society,2008:187-201.
[11] SMITH R D.Toward robust network payload inspection[D].Madison,USA:University of Wisconsin at Madison,2009.
[12] YANG Y E,PRASANNA V K.Space-time tradeoff in regular expression matching with semi-deterministic finite automata[C]//Proceedings of IEEE International Conference on Computer Communications.Washington D.C.,USA:IEEE Press,2011:1853-1861.
[13] MAJUMDER A,RASTOGI R,VANAMA S.Scalable regular expression matching on data streams[C]//Proceedings of ACM SIGMOD International Conference on Management of Data.New York,USA:ACM Press,2008:161-172.
[14] ROHRER J,ATASU K,LUNTEREN J,et al.Memory-efficient distribution of regular expressions for fast deep packet inspection[C]//Proceedings of the 7th IEEE/ACM International Conference on Hardware/Software Codesign and System Synthesis.New York,USA:ACM Press,2009:147-154.
[15] WATSON B W.A new regular grammar pattern matching algorithm[J].Theoretical Computer Science,1996,299(1/2/3):364-377.
[16] NAVARRO G.NR-grep:a fast and flexible pattern-matching tool[J].Practice and Experience,2001,31(13):1265-1312.
[17] YANG X,QIU T,WANG B,et al.Negative factor:improving regular-expression matching in strings[J].ACM Transactions on Database Systems,2016,40(4).