无间隙约束下无重叠模式匹配的在线求解算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:On-line Algorithm of Non-overlapping Pattern Matching Without Gap Constraints
  • 作者:柴欣 ; 王建姣 ; 闫文杰 ; 武优西
  • 英文作者:CHAI Xin;WANG Jian-jiao;YAN Wen-jie;WU You-xi;School of Artificial Intelligence,Hebei University of Technology;Hebei Province Key Laboratory of Big Data Calculation;
  • 关键词:模式匹配 ; 间隙约束 ; 无重叠条件 ; 在线求解算法
  • 英文关键词:pattern matching;;gap constraint;;non-overlapping condition;;on-line algorithm
  • 中文刊名:XXWX
  • 英文刊名:Journal of Chinese Computer Systems
  • 机构:河北工业大学人工智能与数据科学学院;河北省大数据计算重点实验室;
  • 出版日期:2019-07-15
  • 出版单位:小计算机系统
  • 年:2019
  • 期:v.40
  • 基金:国家自然科学基金项目(61702157)资助;; 黑龙江省自然科学基金项目(F2017019)资助
  • 语种:中文;
  • 页:XXWX201907026
  • 页数:5
  • CN:07
  • ISSN:21-1106/TP
  • 分类号:133-137
摘要
间隙约束序列模式挖掘可以有效地挖掘满足用户特定需要的频繁模式,其核心是间隙约束模式匹配问题.无重叠的模式匹配问题是其中的一种方法,即任何两个出现的相同位置不能共用序列的同一位置的字符.但在无先验知识的情况下,如何设定间隙是难以解决的问题.针对此问题,本文设计了在线匹配算法SNGP-Best,其依据序列串来计算满足查询模式的最多出现数.该算法通过计算模式的长度来确立队列的个数,然后采用在线计算的方式,能够及时计算出满足条件的出现并输出,起到了降低算法空间复杂性的作用.实验结果验证SNGP-Best算法具有良好的求解性能.
        Sequence pattern mining with gap constraints can effectively mine frequent patterns that meet the specific needs of users. The core of this mining problem is the pattern matching with gap constraint which has diversified forms. The non-overlapping pattern matching problem is one of them,which means that any two occurrences cannot use the same character of the sequence at the same position of the pattern. However,how to set the gap without prior knowledge is a difficult problem. To tackle this issue,this paper designs an online matching algorithm,named SNGP-Best,which calculates the maximum number of occurrences that satisfy the query mode based on the sequence string. This algorithm establishes the number of queues by calculating the length of the pattern,and then uses the online calculation method to calculate the occurrence of specific satisfied conditions and output it timely,which reduces the complexity of the algorithm space. The experimental results validate the competitive performance in solving of the SNGP-Best.
引文
[1]Shi Hao-hong,Yang Wei-dong.Research on method of multiple data sources schema matching based on global ontology[J].Journal of Chinese Computer Systems,2016,37(6):1148-1152.
    [2]Wu XD,Zhu XQ,He Y,et al.Pattern mining from biological sequences with wildcard constraints[J].Computers in Biology and Medicine,2013,43(5):481-492.
    [3]Chai Xin,Jia Xiao-fei,Wu You-xi,et al.Strict pattern matching with general gaps and one-off condition[J].Journal of Software,2015,26(5):1096-1112.
    [4]Xuan J,Jiang H,Hu Y,et al.Towards effective bug triage with software data reduction techniques[J].IEEE Transactions on Knowledge&Data Engineering,2014,27(1):264-280.
    [5]Dong X,Gong Y,Cao L.F-NSP+:a fast negative sequential patterns mining method with self-adaptive data storage[J].Pattern Recognition,2018,84:13-27.
    [6]Cao Cheng-hong,Lei Ying-ke,Xu Yi-ming.AC-IM algorithm oriented frequent statistics of link layer bit stream data[J].Journal of Chinese Computer Systems,2018,39(7):1436-1440.
    [7]Cao L,Dong X,Zheng Z.E-NSP:efficient negative sequential pattern mining[J].Artificial Intelligence,2016,235:156-182.
    [8]Wu YX,Liu YW,Guo L,et al.Subnettrees for strict pattern matching with general gaps and length constraints[J].Journal of Software,2013,24(5):915-932.
    [9]Aho A V,Corasick M J.Efficient string matching:an aid to bibliographic search[J].Communications of the ACM,1975,18(6):333-340.
    [10]Wu You-xi,Zhou Kun,Liu Jing-yu,et al.Mining sequence pattern with periodic general gap constraints[J].Journal of Computer Science,2017,40(6):1338-1352.
    [11]Tan CD,Min F,Wang M,et al.Discovering patterns with weakwildcard gaps[J].IEEE Access,2017,4(1):4922-4932.
    [12]Wu XD,Qing JP,Xie F.Pattern matching with flexible wildcards[J].Journal of Computer Science and Technology,2014,29(5):740-750.
    [13]Wu YX,Tong Y,Zhu XQ,et al.NOSEP:nonoverlapping sequence pattern mining with gap constraints[J].IEEE Transactions on Cybernetics,2018,48(10):2809-2822.
    [14]Yang Hao,Duan Lei,Hu Bin,et al.Mining top-k distinguishing sequential patterns with gap constraint[J].Journal of Software,2015,26(11):2994-3009.
    [15]Min F,Zhang ZH,Zhai WJ,et al.Frequent pattern discovery with tri-partition alphabets[J].Information Sciences,DOI:10.1016/j.ins.2018.04.013.
    [16]Ding BL,Lo D,Han JW,et al.Efficient mining of closed repetitive gapped subsequences from a sequence database[C]//IEEE International Conference on Data Engineering,IEEE Computer Society,2009:1024-1035.
    [17]Xie Fei,Qiang Ji-peng.Sequential pattern mining with wildcards and non-overlapping condition[J].Journal of Chinese Computer Systems,2017,38(5):956-960.
    [18]Wu YX,Shen C,Jiang H,et al.Strict pattern matching under nonoverlapping condition[J].Science China Information Sciences,2017,60(1):1-16.
    [19]Wang X,Duan L,Dong G,et al.Efficient mining of density-aware distinguishing sequential patterns with gap constraints[C]//International Conference Database Systems for Advanced Applications,Bali,2014,8421:372-387.
    [20]Wang Hui-feng,Duan Lei,Zuo Jie,et al.Efficient mining of distinguishing sequential patterns without a predefined gap constraint[J].Journal of Computer Science,2016,39(10):1979-1991.
    [21]Fan WF,Li JZ,Luo JZ.Incremental graph pattern matching[C]//ACM SIGMOD International Conference on Management of Data,ACM,2011:925-936.
    [22]Nic V.Pattern matching for the masses using custom notations[J].Science of Computer Programming,2012,68(3):609-635.
    [23]Anat BB,David H,Yaron K.CompactDFA:scalable pattern matching using longest prefix match solutions[J].IEEE/ACM Transactions on Networking,2014,22(2):415-428.
    [1]石浩宏,杨卫东.基于全局本体的多数据源模式匹配方法的研究[J].小计算机系统,2016,37(6):1148-1152.
    [3]柴欣,贾晓菲,武优西,等.一般间隙及一次性条件的严格模式匹配[J].软件学报,2015,26(5):1096-1112.
    [6]曹成宏,雷迎科,徐一鸣.面向链路层比特流数据频繁统计的ACIM算法[J].小计算机系统,2018,39(7):1436-1440.
    [10]武优西,周坤,刘靖宇,等.周期性一般间隙约束的序列模式挖掘[J].计算机学报,2017,40(6):1338-1352.
    [14]杨皓,段磊,胡斌,等.带间隔约束的Top-k对序列模式挖掘[J].软件学报,2015,26(11):2994-3009.
    [17]谢飞,强继鹏.满足非重叠条件的带有通配符序列模式挖掘[J].小计算机系统,2017,38(5):956-960.
    [20]王慧锋,段磊,左劼,等.免预设间隙约束的对比序列模式高效挖掘[J].计算机学报,2016,39(10):1979-1991.
    1 http://www.ncbi.nlm.nih.gov
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.