交易序列数据挖掘研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
交易序列数据描述的是在各类交易过程中商品或证券价格随时间的变化规律,分析这些数据能为商家或投资者制定营销策略或选择价值投资方法提供量化依据,由此交易序列数据挖掘技术成为当前研究和应用的热点。
     交易序列数据挖掘的目的是识别商品或证券交易价格变化规律,主要任务有分类、聚类、关联分析和异常检测等,还可以进行各种扩展的数据分析与挖掘,如允许有时间间隔约束的关联规则、数据有缺失值存在的模式分析等。
     目前,针对交易序列数据的大量研究使用的是其他序列数据挖掘与分析方法,比如将其离散时间的序元序看作连续的、使用时间序列结构化或非结构化模型与各种复杂算法相结合的方法,又如忽略其数值型序元值、使用特征构建成事件序列进行频繁模式挖掘方法;再如将其数值型的序元值进行字符表示、使用字符序列模式查找的方法。这些研究方法存在以下两方而问题:一方而,没有同时兼顾交易序列数据本身固有的离散时间序和数值型元素值两大特性;另一方而,没有利用可用的经济与金融领域知识。兼顾交易序列本身原有特性并有效找到各种符合领域意义的频繁相似模式,能使数据分析与挖掘结果更有效。
     本文从交易序列基本模式出发,定义了5种交易序列原子模式(包括:趋平模式、头部模式、底部模式、增长模式和下降模式)及其关联关系,即交易序列复合模式,着重研究了交易序列模式挖掘、交易序列模式查询与预测和基于交易序列模式的聚类三方面问题,主要研究成果如下:
     (1)针对交易序列模式挖掘问题,在原子模式快速查找及其TOP K频繁项挖掘两个算法的基础上,提出了一种频繁的交易序列复合模式挖掘算法。
     频繁的交易序列复合模式是由多种满足一定时间约束及其周期循环关系的交易原子模式频繁集组成的,在此项挖掘任务中,由于候选原子模式空间是呈指数级增长的,因而效率问题成为一个瓶颈。
     首先,根据领域知识定义了5种交易序列原子模式,提出了一种伸缩距离函数的序列模式通用相似性度量及其趋势融合和对称使用距离函数的计算方法,将“缩放”相似的4种交易序列原子模式(除趋平模式)分别转化为相似性无向图进行谱聚类;然后,在以结果簇近似代替最大团的基础上,引入时间约束代替趋平模式找到由各种交易序列原子模式频繁集构成的频繁复合模式。在真实股票交易序列集上,采用多种相似性计算方法比较得到算法准确性,并且所求得的频繁复合模式有较好的应用解释。
     (2)针对交易序列模式查询问题,提出了两种有效的相似性查询算法。
     在现实应用中,交易序列有一种重要的相似性——“缩放”相似性,这是交易序列模式在时间维度上的“弹性”拉长或缩短但会保留在数值维度上整体变化趋势的一种相似性。因而如何定义合理的相似性度量来捕捉这种相似性是一个需要解决的重要问题。
     针对序列间的细微变化,先对待查询的序列进行单调区间的“融合”处理,然后根据各区间的长度和幅度比例进行序列模式的候选产生,最后使用伸缩距离函数作为相似性度量进行计算并返回最后结果;针对交易序列的价格区间变化,先将所有序列进行规范化,在改进伸缩距离函数定义的基础上进行计算并得到查询结果。实验结果表明,“趋势融合”和“价格融合”两种相似性查询算法都能找到在总体形状上与给定序列模式“放大”或“缩小”的所有模式结果。
     (3)针对交易序列预测问题,提出了一种有较高准确率的序列模式趋势预测算法。
     预测是根据给定交易序列数据集,对给定待查序列的后续时间进行数值属性上的估计。由于数据变化的复杂性,在交易序列中进行趋势预测比精确预测更有意义,因而提高对给定序列趋势预测准确度成为预测问题的关键。
     基于“价格融合”的相似性查询,本文使用Parzen窗密度和KNN的估计两种方法分别证明了将查询结果候选集的TOP k个结果的后续长度为τ的模式加权平均,能近似替代全部查询结果,进而综合出预测结果。在真实股票交易序列集上的实验结果表明,趋势预测有较高的准确率。
     (4)针对交易序列聚类问题,提出了一种考虑时限约束目标函数的聚类算法。
     交易序列进行聚类选择何种对象进行很关键。在一定时问范围内,总体呈增长或下降趋势更能反映商品或证券的价格规律,因而从原始的交易序列中提取了这种反映局部信息的增长或下降模式进行特征创建并进行聚类的意义大于直接使用原始交易序列。
     首先,从商品或证券价格及其变化趋势等角度研究了交易序列集的内在结构,定义了一种反映价格变化趋势的增长或下降模式及其错位组合距离和角度向量距离两种递进的相似性度量,在此基础上,设计了一个考虑时限约束的目标函数进行先划分再层次合并的聚类研究。实验结果表明,在时限约束的条件下,增长或下降模式这种特征提取方式及其模式间的两种距离函数能较好地产生聚类结果,并且这些聚类结果能得到较好地解释。
Transaction Sequence Data describes the way prices of commodities or securities change with time in various transactions. The analysis of such data could provide the quantitative basis for merchants or investors in making marketing strategies or choosing investing methods, thus making the mining technology for Transaction Sequence Data a topic of great interest to both research and application.
     The mining of Transaction Sequence Data aims to identify the price changing patterns of commodities or securities and its major tasks include classifying, clustering, association analysis, anomaly detection and so on. It can also conduct a variety of expanded data analysis and mining, such as association rules allowing for time constraints, pattern analysis for data with missing values and so on.
     At present, extensive research on Transaction Sequence Data has adopted other sequence data mining and analyzing approaches, such as the one which treats the discrete chronological order as continuous and combines time-series structured or non-structured patterns with various complex algorithms or the one which neglects the numeric sequence elements and constructs characteristics into sequence of events for the mining of frequent patterns. There is also one approach which denotes numeric sequent elements in characters and search through character sequence patterns. Such research approaches are confronted with the following two types of problems. On one hand, they fail to give equal weight to the discrete chronological order and numeric value elements, two intrinsic features of Transaction Sequence Data. On the other hand, they do not utilize the available domain knowledge in the areas of economy and finance. To effectively locate various frequent patterns that fit the domain significance with the innate features of Transaction Sequence factored in will make the results of data analysis and mining more valid.
     Departing from basic Transaction Sequence Patterns, this paper defines five Transaction Sequence Primitive Patterns, i.e. Flat Pattern, Top Pattern, Bottom Pattern, Growth Pattern and Decline, and their correlations, i.e., Transaction Sequence Composition Patterns, with emphasis on three issues, namely, the mining of Transaction Sequence Patterns, the inquiry and prediction of Transaction Sequence Patterns and clustering based on Transaction Sequence Patterns.
     Below are the major findings of this paper:
     (1) In response to the mining of Transaction Sequence Patterns, this paper proposes an algorithm for the mining of Frequent Transaction Sequence Composition Patterns based on the algorithms for primitive pattern quick search and the mining of their TOP K frequent itemsets.
     Frequent Transaction Sequence Composition Patterns are composed of multiple Transaction Primitive Pattern Frequent Sets which satisfy certain time constraints and a cycling relationship. For this mining task, as the space of candidate primitive patterns grows exponentially, the efficiency issue poses as a bottleneck.
     Five Transaction Sequence Primitive Patterns are firstly defined according to the domain knowledge. A universal similarity measure for sequence patterns employing a stretchable distance function is put forward as well as its trend merging and symmetric calculation. Thus, Transaction Sequence Patterns (Flat Pattern excluded) with "stretchable" similarities are converted into similarity undirected graphs for spectral clustering. Then, based on the substitution of result cluster approximation for Maximal Cliques, time constraints are introduced to replace Flat Patterns so as to locate Frequent Composition Patterns composed of multiple Transaction Sequence Primitive Pattern Frequent Sets. Several similarity calculations are compared in real stock transaction sequence sets to obtain the calculation accuracy and the Frequent Composition Patterns achieved have good application interpretations.
     (2) In response to the inquiry of Transaction Sequence Patterns, this paper proposes two effective similarity inquiry algorithms.
     In practical applications, Transaction Sequence Patterns embody an important type of similarity, called "stretchable" similarity, which causes Transaction Sequence Patterns to lengthen or shorten "strecthably" in the time dimension but maintain the overall trend in the value dimension. To define appropriate similarity measures to capture such similarities is an important issue demanding solutions
     Regarding the slight variations between sequences, firstly "merging" is conducted on monotonous intervals of sequences to be inquired, then candidates of sequence patterns are generated according the ratios of length to amplitude of each interval, and lastly calculation is done employing the stretchable distance function as the similarity measure with final results returned. For the changes in price intervals of Transaction Sequence, firstly all sequences are normalized, then calculation is done based on the improved definition of stretchable distance function and inquiry results are obtained. As shown by the experiment results, both "trend merging" and "price merging", the two similarity inquiry algorithms, can find all the pattern results whose overall shapes are the "enlargement" or "reduction" of the given sequence patterns.
     (3) In response to the prediction of Transaction Sequence Patterns, this paper proposes a sequence pattern trend prediction algorithm with high accuracy.
     Prediction, based on given Transaction Sequence Datasets, estimates the numerical attributes of successive intervals of given sequences to be inquired. Due to the complexity of data variation, conducting trend prediction in Transaction Sequence is more meaningful than accurate prediction and thus to increase the accuracy of trend prediction for given sequences is key to prediction.
     Based on the similarity inquiry of "price merging", utilizing the methods of Parzen window density and KNN estimation, this paper proves that to conduct weighted average on the patterns with a length of succeeding the TOP K results of the candidate sets of inquiry results can approximately replace all the inquiry results and subsequently produce the prediction results in a comprehensive way. The results of experimenting with real stock transaction sequence sets show that trend prediction has a high accuracy level.
     (4) In response to Transaction Sequence clustering, this paper proposes a clustering algorithm which factors in the goal function with time constraints.
     The choice of objects is crucial to the clustering of Transaction Sequence. Within a certain time span, an overall growing or declining trend can more faithfully reflect the price patterns of commodities or securities. Therefore, to extract, from the original Transaction Sequence, such growth or decline patterns with local reflection for characteristic creation and clustering is more valuable than the direct use of the original Transaction Sequence.
     Firstly, the inner structure of Transaction Sequence Sets is studied from the perspectives of commodity or security prices and their trends, and then a growth or decline pattern reflecting price trends is defined along with two progressive similarity measures, i.e. shifted window combined distance function and angle vector distance function. On this basis, a goal function with time constraints taken into account is designed for research which firstly studies partitioned clustering and then hierarchical clustering. The experiment results show that, under time constraints, such a characteristic extraction method for Growth or Decline Pattern and the two distance functions between patterns can effectively produce clustering results, which can be satisfactorily interpreted.
引文
[ACL+10]Deepak Agarwal, Datong Chen, Longji Lin, Jayavel Shanmugasundaram, Erik Vee:Forecasting high-dimensional data[C]. In:Proc. of SIGMOD 2010:1003-1012.
    [Adl94]L M Adleman. Molecular Computation of Solutions to Combinatorial Problems[C]. Science,1994:266.
    [AFS93]Agrawal R, Faloutsos C, Swami A. Efficient Similarity Search in Sequence Databases[C]. In:Proc. of the 4th Int'l. Conference on Foundations of Data organization and Algorithms (FODO) 1993:69-84.
    [AGY+02]J. Ayres, J. E. Gehrke, T. Yiu, and J. Flannick. Sequential Pattern Mining Using Bitmaps[C]. In:Proc. of the 8th ACM SIGKDD Int'l. Conference on Knowledge Discovery and Data Mining. Alberta, Canada,2002.
    [APW+95]R. Agrawal, G Psaila. E. L. WimmeH and M. Zait. Querying shapes of histories[C]. In Proc of the 21st Int'l Conference on very Large Databases. San Francisco:Morgan Kaufrnann Publishers Inc.1995:502-514.
    [AR+11]Elif Aktolga, Irene Ros and Yannick Assogba.2011. Detecting outlier sections in us congressional legislation[C]. In proceedings of the34th Annual ACM SIGIR Conference. Beijing, China, July 24-28,235-244. DOI =doi.acm.org/10.1145/2009916.2009951.
    [AS95]R. Agrawal and R. Srikant. Mining Sequential Patterns[C]. In Proc.1995 Int'l. Conference Data Engineering (ICDE'95),3-14, Taipei, Taiwan,1995.
    [AS96]Agrawal and R. Srikant. Mining Sequential Patterns:Generalizations and Performance Improvements. In:Proc. of the International Conference on Extending Database Technology. Sprjnger_Verlag,1996.
    [ASK+03]J. Alon, S. Sclaro, G Kollios, V Pavlovic. Discovering clusters in motion time-series data[C]. In:Proc. of the IEEE Computer Vision and Pattern Recognition Conference (CVPR),2003.
    [ATR+11]F Martinez-Alvarez, A. Troncoso, J.C. Riquelme and J.S. Aguilar-Ruiz. 2011. Discovery of motifs to forecast outlier occurrence in time series[C]. Pattern Recognition Letters,32:1652-1665. DOI= dx.doi.org/10.1016/j.patrec.2011.05.002.
    [BC94]Donald J. Berndt, James Clifford. Using Dynamic Time Warping to Find Patterns in Time Series[C]. In Proceedings of the KDD Workshop, Seattle, WA.1994:359-370.
    [BD+00]Brejova B, DiMarco C, Vinar T, Hidalgo SR, Holguin G, Patten C. Finding Patterns in Biological sequences[M]. Technical report CS-2000-22, University of Waterloo,2000.
    [BJ+03]Francis R. Bach, Michael I. Jordan. Learning Spectral Clustering[C].2003. Proceedings of the 16th Neural Information Processing Systems, December 8-13, Vancouver and Whistler, British Columbia, Canada. DOI= books.nips.cc/papers/files/nipsl6/NIPS2003_AA39.pdf.
    [BJZ03]A. J. Bagnall, G Janakec, M. Zhang. Clustering Time Series from Mixture Polynomial Models with Discretised Data[C]. Technical Report CMP-C03-17, School of Computing Sciences, University of East Anglia, 2003.
    [BMD+09]Bernhard Bruhl, Marco Hiilsmann, Detlef Borscheid, Christoph M. Friedrich, Dirk Reith:A Sales Forecast Model for the German Automobile Market Based on Time Series Analysis and Data Mining Methods[C]. In: Proc. of ICDM.2009:146-160.
    [BYO97]T. Bozkaya. N. Yazdani and z. M. Ozsoyoglu. Matching and Indexing Sequences of Different Lengths[C]. Proceedings of the sixth international conference on Information and knowledge management. New York:ACM Press.1997:128-135.
    [CD01]Chaudhuri P, Das S. Statistical analysis of large DNA sequences using distribution of DNA words[J]. Current Science,2001,80(9):1161-1166.
    [CD02]Chaudhuri P, Das S. SWORDS:A statistical tool for analyzing large DNA sequences[J]. Journal of Biosciences,2002,27(1):1-6.
    [CF99]K. P. Chan and W. C. FU. Efficient Time Series Matching by Wavelets[C]. Proceedings of the International Conference OH Data Engineering. Washington:IEEE Computer Society.1999:126-133.
    [CK68]J. Cohen. Weighted kappa:Nominal scale agreement provision for scaled disagreement or partial credit[J].1968. Psychological Bulletin, 70(4):213-220.
    [CKL03]Bill Chiu, Eamonn Keogh, Stefano Lonardi.2003. Probabilistic Discovery of Time Series Motifs[C]. In proceedings of the 9th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, Washington, DC, USA, August 24-27,493-498. DOI= doi.acm.org/10.1145/956750.956808.
    [CMN06]Dina Coldin, Ricard Mardales, George Nagy. In search of meaning for time series subsequence clustering:matching algorithms based on a new distance measure[C]. In:Proc. of the ACM CIKM.2006,347-356.
    [COO05]Chen L, Ozsu MT, Oria V. Robust and fast similarity search for moving object trajectories[C]. In:Proc. of SIGMOD.2005:491-502.
    [CT08]Haibin Cheng, Pang-Ning Tan:Semi-supervised learning with data calibration for long-term time series forecasting[C]. In:Proc. of KDD 2008:133-141.
    [CY+10]James Cheng, Yiping Ke, Ada Wai-chee Fu, Linhong Zhu.2010. Finding maximal cliques in massive networks by H*-graph[C]. Proceedings of the ACM SIGMOD International Conference on Management of Data 2010, June 6-10, Indianapolis, Indiana, USA. DOI= doi.acm.org/10.1145/1807167.1807217.
    [CZZ08]Chen W, Zhang C, Zhog H. An unsupervised protein sequences clustering algorithm using functional domain information[C]. In:Proc. of information reuse and integration.2008:76-81.
    [DLT10]Do TDT, Laurent A, Termier A. PGLCM:Efficient Parallel Mining of Closed Frequent Gradual Itemses[C]. In:Proc. of the 2010 International Conference on Data Mining.2010:138-147.
    [DP07]Dong G, Pei J. Sequence Data Mining (Advances in Database Systems)[M]. Springer,2007.
    [DSO78]Dayhoff M, Schwartz R, Orcutt B. A Model of Evolutionary Change in Proteins [J]. National Biomedical Research Foundation,1978,5(3):345-352.
    [DT+08]Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang and Eamonn Keogh.2008 Querying and Mining of Time Series Data: Experimental Comparison of Representations and Distance Measures[C]. Proceedings of the 34th International Conference on Very Large Data Bases. Auckland, New Zealand, August 24-30,1542-1552. DOI= www.vldb.org/pvldb/1/1454226.pdf.
    [FG+09]Erich Fuchs, Thiemo Gruber, Jiri Nitschke, Bernhard Sick. On-line motif detection in time series with SwiftMotif[J]. Pattern Recognition.42(2009): 3015-3031. DOI=10.1016/j.patcog.2009.05.004.
    [FKL+05]Fu AW, Keogh EJ, Lau LY, Ratanamahatana CA. Scaling and time warping in time series quering[C]. In:Proc. of the VLDB.2005:649-660.
    [FRM94]C. Faloutsos, M. Ranganathan, and Y. Manolopoulos. Fast Subsequence Matching in Time-Series Databases[C]. In Proc. of the ACM SIGMOD Conference on Management of Data. New York:ACM Press.1994: 419-429.
    [FTP10]Floratou A, Tata S, pate1 JM. Efficient and Accurate Discovery of Patterns in Sequence Datasets[C]. In:Proc. of the 2010 Int'l Conference on Data Engineering.2010:461-472.
    [FWW+06]Don Jyh-Fu Jeng, Junzo Watada, Berlin Wu, Jui-Yu Wu:Fuzzy Forecasting with NA Computing[C]. DNA 2006:324-336
    [GE67]Clive W.J. Granger, C. M. Elliot. A Fresh Look at Wheat Prices and Marke in the Eighteenth Century[J]. Economic History Review 1967(20):257-265.
    [Ges97]Gesfield D. Algorithms on strings, trees, and sequences[M]. Cambridge University Press,1997.
    [GK01]Guralnik V, Karypis G. A scalable algorithm for clustering sequential data[C]. In:Proc. of the IEEE Int'l Conference on Data Mining. Washington: IEEE Computer Society,2001:179-186.
    [GK03]V. Guralnik and G. Karypis, Parallel Formulations of Tree-Projection-Based Sequence Mining Algorithm[C]. http://citeseerist.psu.edu/571904.html.
    [GK95]D. Q. Goldin and P. C. Kanellakis. On similarity queries for time-series data: constraint specification and implementation[C]. Proceedings of the First International Conference on Principles and Practice of Constraint Programming. London:Springer-Verlag.1995:137-153.
    [GKZ+98]Golay X, Kollias S, Stoll G, Meier D, Valavanis A, Boesiger P. A new correlation-based fuzzy logic clustering algorithm for fMRI[C]. Mag. Resonance Med,1998(40):249-260.
    [Gs00]Xianping Ge, Padhraic Smyth, Deformable Markov Model Templates for Time Series Pattern Matching[C]. Proceedings of the sixth ACM SIGKDD international conference Oil Knowledge discovery and data mining. New York:ACM Press.2000:81-90.
    [HDY99]J, Han, G. Dong, and Y. Yin. Efficient mining of partial periodic patterns in time series database[C]. In:Proc IEEE Int'l on Data Engineering (ICDE'99), 106-115, Australia.1999.
    [HGP+08]Hui Ding, Goce Trajcevski, Peter Scheuermann, Xiaoyue Wang, Eamonn J. Keogh. Querying and mining of time series data:experimental comparison of representations and distance measures[C]. In:Proc. of The Proceedings of the VLDB Endowment (PVLDB),2008:1542-1552.
    [HH92]Henikoff S, Henikoff J. Amino Acid Substitution Matrices from Protein Blocks[J]. Proceedings of the National Academy Sciences of the United States of America (PNAS),1992,89(22):10915-10919.
    [HHY11]Tsung-Jung Hsieh, Hsiao-Fen Hsiao, Wei-Chang Yeh:Forecasting stock markets using wavelet transforms and recurrent neural networks:An integrated system based on artificial bee colony algorithm[C]. In:Proc. of Appl. Soft Comput. (ASC) 11(2):2510-2525 (2011).
    [HL09]Hadjielefttheriou M, Li C. Efficient approximate search on string collections[C]. In Proc. of Int'l Conference on Data Engineering (ICDE Tutorial),2009.
    [HMS01]Hand D, Mannila H, Smyth P. Principles of data mining [M]. MIT Press. 2001.
    [HPM00]J. Han, J. Pei and B. Mortazavi-Asl. Freespan:Frequent pattern-projected sequential pattern mining[C]. In Proc. of the International Conference on Knowledge Discovery and Data Mining. ACM,2000.
    [HPM01]J. Han, J. Pei and B. Mortazavi-Asl. PrefixSpan:Mining Sequential Patterns Efficiently by Prefix-Projected Pattern Growth[C]. In Proc. of the International Conference on Data Engineering. IEEE Press,2001.
    [HY99]Yun-Wu Huang and Philip S. Yu. Adaptive query processing for time-series data[C]. Proceedings of the fifth ACM SIGKDD international conference on Knowledge discovery and data mining. New York:ACM Press.1999: 282-286.
    [JB97]Andrc-Jonsson and D. Badal. Using Signature Files for Querying Time-Series Data[C]. Proceedings of the First European Symposium on Principles of Data Mining and Knowledge Discovery. London: Springer-Verlag.1997:211-220.
    [KJM95]A. Koski, M. Juhola, and M. Meriste. Syntactic Recognition of ECG Signals By Attributed Finite Automata[J]. Pattern Recognition,1995,28(12): 1927-1940.
    [KP98]E. Keogh, M. Pazzani. An enhanced representation of time series which allows fast and accurate classification, clustering and relevance feedback[C]. Proceedings of the 4th International Conference of Knowledge Discovery and Data Mining. Callfornia:AAAI Press.1998:239-241.
    [KPC01]S.-W Kim. S. Park, W Chu. An Index-Based Approach for Similarity Search Supporting Time Warping in Large Sequence Databases[C]. Proceedings of the 17th International Conference on Data Engineering. Washington:IEEE Computer Society.2001:607-614.
    [KPW02]Kumar M, Patel NR, Woo J. Clustering seasonality patterns in the presence of errors[C]. In:Proc. of the Int'l Conference on Knowledge Discovery and Data Mining. Edmonton:ACM Press.2002.
    [KS01]Kahveci T, Singh AK. An Efficient Index Structure for String Databases[C]. In:Proc. of the 27th Int'l Conference on Very Large Data Bases(VLDB). 2001, pages 351-360.
    [Ks97a]E. Keogh. P_Smyth. A probabilistic approach to fast pattern matching in time series databases[C]. Proceedings of the 3rd International Conference of Knowledge Discovery and Data Mining. California:AAAI Press.1997: 24-30.
    [L07]Ulrike von Luxburg.2007. A Tutorial on Spectral Clustering. Statistics and Computing[J]. (SAC) 17(4):395-416. DOI=arxiv.org/abs/0711.0189.
    [LB00]C. Li and Biswas. A Ayesian Approach to Temporal Data Clustering Using Hidden Markov Models[C]. In International ConL on Machine Learning, 543-550,2000.
    [LH98]H. Lu, J. Han and L. Feng. Stock movement and n-dimensional inter-transaction association rules[C]. In Proc.1998 SIGMOD Workshop on Research Issues on Data Mining and Knowledge Discovery(DMKD'98),12: 1-12:7, Seattle, WA,1998.
    [Lia05]T.Warren Liao. Clustering of time series data-a survey[J]. Pattern Recognition.2005,38(11):1857-1874.
    [LK+03]Jessica Lin, Eamonn Keogh, Stefano Lonardi and Bill Chiu.2003. A symbolic representation of time series, with implications for streaming algorithms[C]. Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, San Diego, California, USA, June 13,1-11. DOI= doi.acm.org/10.1145/882082.882086.
    [LK+07]Jessica Lin, Eamonn J. Keogh, Li Wei, Stefano Lonardi:Experiencing SAX: a Novel Symbolic Representation of Time Series[C].2007. Data Mining and Knowledge Discovery.15(2):107-144. DOI dx.doi.org/10.1007/s10618-007-0064-z.
    [LKC+03]Levet CS, Klawonn, Cho KH, Wolkenhauer O. Fuzzy clustering of short time series and unevenly distributed sampling points [M]. In:Proc. of the 5the International Symposium on Intelligent Data Analysis.2003.
    [LKL+02]Lin J, Keogh E, Lonardi S, Patel P. Finding Motifs in Time Series[C]. In: Proc. of the Int'l Conference on Knowledge Discovery and Data Mining. Edmonton:ACM Press.2002.
    [LLL08]Li C, Lu J, Lu Y. Efficient merging and filtering algorithms for approximate string searches[C]. In Proc. of ICDE.2008:257-266.
    [LP76]Walter C. Labys, Yves Perrin. Multivariate Analysis of Price Aspec of Commodity Stabilization[J]. Weltwirchaftliches Archives,1976(112): 556-564.
    [LSL+00]V. Lavrenko, M. Schmill, D. Lawrie, P. Ogilvie, D. Jensen, J. Allan. Mining of concurrent text and time series[C]. Proceedings of the 6th ACM SIGKDD Int'l Conference on Knowledge Discovery and Data Mining Workshop on Text Mining. Boston, MA,2000:37-44.
    [LW97]S. K. Lam and M. H. Wong. A Fast Signature Algorithm for Sequence Data Searching[C]. In The Third International Workshop on Next Generation Information Technologies and Sy'stems.1997:172-181.
    [LW98]S. K Lam and M. H. Wong. A Fast Projection Algorithm for Sequence Data Searching[J]. Data and Knowledge Engineering,1998,28(3):321-339.
    [MK+03]C.S. Moller-Levet, F. Klawonn, K.-H. Cho, O. Wolkenhauer. Fuzzy clustering of short time series and unevenly distributed sampling points[J]. In Proceedings of the 5th International Symposium on Intelligent Data Analysis, Berlin, Germany, August 28-30,2003.
    [MK+11]Abdullah Mueen, Eamonn Keogh, Qiang Zhu, Sydney S. Cash, M. Brandon Westover, Nima Bigdely-Shamlo. A disk-aware algortihm for time series motif discovery[J]. Data Min Knowl Disc (2011)22:73-105 DOI 10.1007/s 10618-010-0176-8.
    [MPT00]F. Masseglia, P. Poncelet, M. Teisseire. Incremental Mining of Sequential Patterns in Large Databases[C]. http://citeseer.nj.nec.eorrdmasseglia00-incremental.html,2000.
    [MS00]Muthukrishnan S, Sahinalp S. Approximate Nearest Neighbors and Sequence Comparison with Block Operations [C]. In:Proc. of the 32nd Annual ACM Symp. on Theory of Computing (STOC).2000, pages 416-422.
    [MSS08]Medves L, Szilagyi L, Szilagyi S. A Modified Markov Clustering Approach for Protein Sequence Clustering[C]. In:Proc. of Proceedings of the Third IAPR International Conference on Pattern Recognition in Bioinformatics. 2008.110-120.
    [MTV95]H. Mannila, H. Toivonen, and A. I. Verkamo. Discovering Frequent Episodes in Sequences [C]. In First International Conference on Knowledge Discovery and Data Mining (KDD95),210-215, Montreal, Canada,1995. AAAI Press.
    [MTV97]H. Mannila, H. Toivonen, and A I. Verkamo. Discovery of frequent episodes in event sequences[J]. Data Mining and Knowledge Discovery,1:259-289, 1997.
    [MW01]Morzy T, Wojciechowski M. Scalable hirrarchical clustering method for sequences of categorical values[C]. In:Proc. of the 5th Pacific-Asia Conference on Knowledge Discovery and Data Mining (PAKDD). LNCS 2035, Hong Kong:Springer-Verlag,2001.282-293.
    [OFC99]T. Oates, L. Firoiu, P.R. Cohen, Clustering time series with hidden Markov models and dynamic time warping[J]. In Proceedings of the IJCAI-99 Workshop on Neural, Symbolic, and Reinforcement Learning Methods for Sequence Learning.
    [PCY+03]Sanghyun Park, Wesley w. Chu, Jeehee Yoon, Jungim Won. Similarity search of time-warped subsequences via a suffix tree[J]. Information Systems.2003,28(7):867-883.
    [PCY+00]Sanghyun Park, Wesley w. Chu, Jeehee Yoon, Chihcheng Hsu. Efficient searches for similar subsequences ofdifferent lengths in sequence databases[C]. Proceedings of the 16th International Conference on Data Engineering. Washington:IEEE Computer Society.2000:23-32.
    [PF02]K. B. Prat, E. FinL Search for paRerns in compressed time series[J]. International Journal oflmage and Graphics.2002,2(1):89-106.
    [PHP01]Pei J, Han JW, Pinto H. PrefixSpan:mining sequential patterns efficiently by prefix-projected pattern growth[C]. In:Proc. of 2001 International Conference on Data Engineering (ICDE2001). Los Almaitos:IEEE Computer Society Press,2001:215-224.
    [PKC01]Sanghyun Parl, Sang-Wook Kim, Wesley w. Chu. Segment-Based Approach for Subsequenee Searches in Sequence Databases[C]. Proceedings of the 16th ACM Symposium oil Applied Computing. New York:ACM Press. 2001:248-252.
    [PLC01]S. Park, D. Lee, W.W Chu. Fast Retrieval of Similar Subsequences in Long Sequence Databases[C]. Proceedings of the 1999 Workshop on Knowledge and Data Engineering Exchange. Washington:IEEE Computer Society. 1999:60-67.
    [PR90]Robert S Pindyck, Julio J Rotemberg. The Excess Co-movement of Commodity Prices [J]. The Economic Journal,1990(100):1173-1189.
    [QDF10]Han Qin, Dejing Dou, Yue Fang:Financial Forecasting with Gompertz Multiple Kernel Learning[C]. In:Proc. of ICDM 2010:983-988.
    [Rip96]B. Ripley. Pattern recognition and neural networks[M]. London:Cambridge University Press.1996.
    [RM97]D. Rafiei and A. Mendeizon. Similarity-Based Queries for Time Series Data[J]. SIGMOD Record,1997,26(2):13-25.
    [SA96]Srikant R, Agrawal R. Mining sequential patterns:generalizations and performance improvements[C]. In:Proc. of 5th International Conference on Extending Database Technology (EDBT). Heidelberg:Springer,1996:3-17.
    [Sar05]Sarawagi S. Sequence Data Mining (Advanced Methods for Knowledge Discovery from Complex Data)[M]. Berlin:Springer,2005.
    [Smy97]P. Smyth. Clustering Sequences with Hidden Markov Models, Advances in Neural Information Processing Systems[M].1st editors, volume 9, 648-654.MIT Press,1997.
    [SSW01]Sang-Wook Kim, Sanghyun Park, Wesley W. Chu. An Index-Based Approach for Similarity Search Supporting Time Warping in Large Sequence Databases[C]. In:Proc. of ICDE.2001 Paper-ID:207.
    [TS+05]Pang-Ning Tan, Michael Steinbach, Vipin Kumar. Introduction to Data Mining [M]. Addison Wesley, US edition.2005.
    [TT+06]Etsuji Tomita, Akira Tanaka and Haruhisa Takahashi. The worst-case time complexity for generating all maximal cliques and computational experiments [J]. Theoretical Computer Science 363:28-42. DOI= 10.1016/j.tcs.2006.06.015.
    [UBA04]Udechukwu A, Barker K, Alhajj R. Discovering all frequent trends in time series[C]. In:Proc. of Winter Int'l Symposium on Information and Commnication Technology.2004:1-6.
    [Ukk92]Ukkonen E. Approximate String Matching with q-gram and Maximal Matching [J]. Theory Computer Science,1992(1):191-211.
    [Ven10]Ventsislav Nikolov. Optimization in time series clustering and prediction[C]. In:Proc.of the International Conference on Computer Systems and Technologies.2010,528-533.
    [VKG02]Vlachos M, Kollios G, Gunopulos D. Dicovering similar multidimensional trajectories[C]. In:Proc. of ICDE.2002:673-684.
    [WCL10]L. Wang, E.S. Cheng, H. Li.2010. A tree-construction search approach for multivariate time series motifs discovery[J]. Pattern Recognition Letters,31: 869-875. DOI=dx.doi.org/10.1016/j.patrec.2010.01.005.
    [WDR06]Wu E, Diao Y, Rizvi S. High-Performance Complex event Processing over Streams [A]. In Proc. of SIGMOD.2006:407-418.
    [WGX06]Wang Guoren, Ge Jian, Xu Hengyu, et al. A sequence similarity query processing technique based on two-partitioning frequency transformation[J]. Journal of Software,2006,17(2):232-241 (in Chinese)(王国仁,葛健,徐恒宇,郑若石.基于二分频率变换的序列相似性查询处理技术.软件学报,2006,17(2):232-241.)
    [WH04]J. Wang, J.Han. BIDE:Efficient Mining of Frequent Closed Sequences[C]. In Proc. of ICDE, Boston, MA,2004:79-90.
    [WJY07]Wook-Shin Han, Jinsoo Lee, Yang-Sae Moon, Haifeng Jian. Ranked subsequence matching in time-series databases[C]. In:Proc. of VLDB.2007: 423-434.
    [WZZ+07]Wang JY, Zhang YZ, Zhou LZ, Karypis G, Aggarwal CC. Discriminating subsequence discovery for sequence clustering[C]. In:Proc. of the 7th Int'l Conference on Data Mining. Minneapolis:SIAM,2007.605-610.
    [X07]Xiong Yun. The Research on Biological Sequential Pattern Mining and Clustering [D]. Shanghai:Fudan University,2007(熊赞.生物序列模式挖掘与聚类研究[D].上海:复旦大学,2007.)
    [XFH04]Xiao, Hui; Feng, Xiao Fei and Hu, Yun Fo. A Hew segmented time warping distance for data mining in time series database[C]. In proceedings of 2004 International Conference on Machine Learning and Cybernetics. Shanghai China.2004:1277-1281.
    [XWL08]Xiao C, Wang W, Lin X, Yu J.X. Efficient similarity joins for near duplicate detection [C]. In Proc. of Int'l Conference on World Wide Web (WWW). 2008, pages 131-140.
    [XY04]Y. Xiong, D.Y. Yeung. Time Series Clustering with ARMA MixturesfJ]. Pattern Recognition,2004(37,8):1675-1689.
    [XZ09]Xiong Yun, Yangyong Zhu. Mining Peculiarity Groups in Day-by-Day Behavioral Datase[C]. In Proc. of ICDM 2009:578-587.
    [YHA03]Yan XF, Han JW, Afshar R. CloSpan:mining closed sequential patterns in large datasets[C]. In:Proc. of the 20th International Conference on Very Large data Bases (VLDB). San Fransisco:Morgan Kaufmann, 2003:166-177.
    [YK+07]Dragomir Yankov, Eamonn Keogh, Jose Medina, Bill Chiu, Victor Zordan. 2007. Detecting Time Series Motifs Under Uniform Scaling. In proceedings[C]. In Proc. of the 13th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Jose, California, USA, August 12-15,844-853. DOI doi.acm.org/10.1145/1281192.1281282.
    [YW03]Yang J, Wang W. CLUSEQ:Efficient and effective sequence clustering[C]. In:Dayal U, Ramamritham K, Vijayaraman TM, eds. Proc. of the 19th Int'l Conference on Data Engneering. Bangalore:IEEE Computer Socirty,2003. 101-112.
    [Zak01]M. J. Zaki. SPADE:An Efficient Algorithm for Mining Frequent Sequences[C]. Machine Leaming Joumal, special issue on Unsupervised Learning (Doug Fisher, ed.),42(1/2):3-1.Jan/Feb 2001.
    [Zak01(2)]M. J. Zaki. Parallel Sequence Mining on Shared-Memory Machines[C]. Journal of Parallel and Distributed Computing, special issue on Higb Performance Data Mining (Vipin Kumar, Sanjay Ranka, Vineet Singh, eds.), VOl.61, No.3,401-426,2001.
    [Zak98]M. J. Zaki. Efficient Enumeration of Frequent Sequences[C].7th Int. Conference on Information and Knowledge Management,68-75, Wasbington DC,1998.
    [Zak99]M J. Zaki. Parallel Sequence Mining on SMP Machines[C]. In Workshop On Large-Scale Parallel KDD Systems(in conjunction 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,57-65, San Diego, CA,1999.
    [ZDX10]Zhu Yangyong, Dai Dongbo, Xiong Yun. A Survey of the Research on Similarity Query Technique of Sequence Data[J]. Journal of Computer Research and Development,2010,42(2):264-276 (in Chinese)(朱扬勇,戴东波,熊赟.序列数据相似性查询技术研究综述[J].计算机研究与发展,2010,42(2):264-276.)
    [ZS03]Zhu Y, Shasha D. Warping indexes with envelop transforms for query by humming[C]. In:Proc. of SIGMOD.2003:181-192.
    [ZX07]Zhu Yangyong, Xiong Yun. DNA Sequence Data Mining Technique[J]. Journal of Software,2007,18(11):2766-2781 (in Chinese). (朱扬勇,熊赞.DNA序列数据挖掘技术[J].软件学报,2007,18(11):2766-2781.)
    [孙+95]孙啸,陆祖宏,谢建明.生物信息学基础[M].清华大学出版社,1995.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700