高性能数据流模式发现算法及其应用研究

英文题名：High Perfermance Data Stream Pattern Discovery Algorithms and Their Applications
作者：周黔
论文级别：博士
学科专业名称：控制科学与工程
中文关键词：数据流 ; 模式发现 ; 趋势提取 ; 变化检测 ; 离群点检测 ; 聚类分析
英文关键词：data stream ; pattern discovery ; trend extract ; change detection ; outlier detection ; clustering
学位年度：2008
导师：吴铁军
学科代码：081101
学位授予单位：浙江大学
论文提交日期：2008-04-01

摘要

随着传感器技术和网络计算的发展,数据流作为一种广泛存在的数据,在网络监控、环境监测、工业控制及财经分析得到广泛应用,这些应用具有如下共同特点:要求实时或近实时连续分析这些数据,数据量特别大并且以流的形式高速到达。传统“先存储然后处理”的数据挖掘模型难于处理这种高速率、瞬息即逝的数据流,挖掘数据流对数据挖掘提出了全新挑战。
     数据流数据中隐含多种模式,如何快速有效发现这些模式,是很多实际应用的核心问题。近年来,数据流模式发现已经成为数据挖掘领域最具挑战性研究课题之一。本文旨在通过引入鲁棒机制及增量遗忘机制提高模式发现算法性能,并将这些算法用于分析工业生产过程,提高产品质量。取得的主要研究成果包括:
     1)提出一种基于系统辨识领域中的增量递推最小二乘回归参数估计方法与广义似然比检验方法有机结合的数据流实时趋势提取算法。该算法对不断到达的数据流元素,采用增量方法确定线性回归模型参数,利用广义似然比检验判断分段边界点,自动分段给出数据流趋势。与现有趋势提取算法相比,该算法不但计算速度快且精度高;
     2)提出一种基于数据驱动的数据流在线模式变化鲁棒检测算法。该算法首先以给定长度的两相邻时间窗口对数据流取样,然后以支持向量数据描述方法将这两相邻时间窗口取样的数据流子集映射到规范化的高维特征空间,并分别建立描述这两相邻时间窗口取样数据流子集映像的最小超球模型(排除了其中的离群点),最后通过计算两超球之间的球心矢量的夹角的余弦,度量该两相邻时间窗口取样数据流子集的相似性检测模式变化。该算法不需要先验知识,不受离群点影响,具有较强鲁棒性;
     3)提出一种基于偏向最近动态最小二乘支持向量回归(RBDLS-SVR)的离群点检测算法。该算法由于采用了基于RBDLS-SVR方法建模,将SVM的学习问题转化为解线性方程组问题,并采用了增量遗忘机制高精度跟踪数据流动态。因此避免了采用一般SVR建模方法应用于数据流回归建模时,每增加或减少一个样本就需要完全重新进行一次求解计算的缺陷,不但计算速度快而且精度高,能有效检测数据流中的离群点;
     4)提出一种基于倾斜时间窗口的数据流偏向最近聚类算法。该算法首先通过将滑动窗口中数据等长分割形成不重叠的数据块——基本窗口,然后对每一基本窗口以Haar小波变换提取窗口数据的特征,通过改变所雀骰敬翱谛〔ū浠幌凳鍪锏奖Ａ艚隙嘧罱菹附谔卣鞯哪康?即对于越近的基本窗口保留越多的小波系数而越旧的基本窗口保留越少的小波系数,最后通过定义数据流偏向最近距离,完成基于倾斜时间窗口的偏向最近聚类算法。该算法计算速度快,能高效地实现数据流偏向最近聚类分析;
     5)阐述了数据流模式发现在实际生产过程中的应用。针对复杂的钢铁生产过程数据,应用本文提出的数据流模式发现算法完成两个挖掘任务:离群点检测及突变发现。理论与实践表明,本文提出的算法在大规模工业生产过程数据分析方面有广阔前景。
     总之,本文主要研究了高性能数据流模式发现算法及其在工业生产过程的应用,这些算法是对现有数据流模式发现的有益补充或改进。理论和实验都表明,与现有算法相比,本文提出的算法在性能(处理速度、处理精度及鲁棒性)方面有明显优势。
With the rapid development of sensor and network technology, various applications generate a large number of stream data, such as network traffic management, environment monitoring, industrial control and finance analysis. These applications share several distinguishing features: the need for real-time or almost real-time continuous analysis, huge volumes of data, and high data rates arrivals. Traditional data mining models of "store and then analysis" are ill-equipped for mining high data rates and transient data stream, mining data stream poses many new challenges.
     There are a lot of patterns in the data stream, how to discovery and identify these patterns efficiently is the core problems of many applications. Recent year, pattern discovery in data stream has been becoming one of most challenge research topics. To improve performance of pattern discovery algorithms in data stream, the mechanism of robust and incremental are introduced in this dissertation, and these algorithms are applied to industrial process analysis. The highlights of our contributions are listed as follows:
     1) By combining an incremental recursive least square algorithm for regression parameter estimation with the generalized likelihood ratio test for change-point detection, a real-time trend extraction algorithm for dynamic data streams is proposed. To segment automatically and extract trend of data stream, the proposed algorithm estimates parameter of linear regression by incremental method and detects boundary points by generalized likelihood ratio test. Remarkably faster computational speed and higher trend analysis accuracy have been achieved by this algorithm compared with the best existing algorithms in the same field;
     2) A robust on-line data stream change detection algorithm based on data-driven is presented. Firstly, sample data stream by two neighbor windows of given length. Then the sampling data is projected to normalized high dimension feature space and the two minus hypersphere models of two window sampling data sets are constructed respectively(outliers are removed). Finally, detect change by computing cosine of inclination angle of two centrals of hypersphere. The algorithm not only is robust but also doesn't need priori knowledge;
     3) A data stream outlier detection algorithm based on recent-biased dynamic least square support vector regression is proposed. The algorithm is modeled by recent-biased dynamic least square support vector regression, therefore it can solve learning problem by linear equation and track dynamic of data stream accurately by incremental and decremental learning mechanism. The algorithm overcomes the shortcoming of modeling by standard support vector regression need computes repeatedly when a sample adds or deletes, not only can achieve fast computational speed but also high accuracy, and can detects outlier in data stream efficiently;
     4) An recent-biased clustering algorithm of data stream based on tilted-time window is proposed. First, the algorithm segments sliding window equal in length to form no overlap data blocks(basic window). Then extract feature of every data block through Haar wavelet transform, and preserve detail feature of recent data by varying number of wavelet coefficients of data block, namely more recent data block, more wavelet coefficient preserved, and vice versa. Finally, by defining recent-biased distance of data stream, implements the recent-biased clustering algorithm of data stream based on tilted-time window. Remarkably faster computational speed and higher efficient have been achieved by this algorithm;
     5) Applies the proposed pattern discovery algorithms of data stream to real industrial process. According to the characters of complex process data of iron and steel making, two pattern discovery tasks have been implemented: outlier detection and pattern change detection. The results show that the proposed algorithms have promised future to analyze data generated by complex industrial process.
     In sum, in this dissertation, several high performance pattern discovery algorithms and their applications are studied, they are improvement and supplement of the existed algorithms. Comparing to the existing algorithms in the same field, theory and simulation results show that the proposed algorithms are higher performance(accurate, computational speed and robust).

引文

1.Fayyad,U.M.,Data Mining and Knowledge Discovery:Making Sense Out of Data.Intelligent.Systems,1996.11(5):p.20-25.
    2.Han,J.and M.Kamber,Data Mining:Concepts and Techniques.2006:Morgan Kaufmann.
    3.郭斯羽.动态数据中的数据挖掘研究.2002,浙江大学博士论文:杭州.
    4.Shawe-Taylor,J.and N.Cristianini,模式分析的核主法.2005,北京:机械工业出版社.
    5.Chen,M.S.,J.Han,and P.S.Yu,Data mining:an overview from a database perspective.IEEE Transactions on Knowledge and Data Engineering,1996.8(6):p.866-883.
    6.钟晓and马少平,数据挖掘综述.模式识别与人工智能,2001.14(1):p.48-55.
    7.Roddick,J.F.and M.Spiliopoulou,A survey of temporal knowledge discovery paradigms and methods.Knowledge and Data Engineering,IEEE Transactions on,2002.14(4):p.750-767.
    8.Agrawal,R.,C.Faloutsos,and A.Swami,Efficient Similarity Search In Sequence Databases.Foundations of Data Organization and Algorithms:4th International Conference,FODO'93,Chicago,Illinois,USA,October 13-15,1993:Proceedings,1993.
    9.Berndt,D.and J.Clifford,Using dynamic time warping to find patterns in time series.AAAI-94 Workshop on Knowledge Discovery in Databases,1994:p.229-248.
    10.Perng,C.S.,et al.,Landmarks:a new model for similarity-based pattern querying intime series databases.Data Engineering,2000.Proceedings.16th International Conference on,2000:p.33-42.
    11.Singhal,A.and D.E.Seborg,Pattern matching in historical batch data using PCA.Control Systems Magazine,IEEE,2002.22(5):p.53-63.
    12.Sfetsos,A.and C.Siriopoulos,Time series forecasting with a hybrid clustering scheme and pattern recognition.Systems,Man and Cybernetics,Part A,IEEE Transactions on,2004.34(3):p.399-405.
    13.Yang,Q.,et al.,10 CHALLENGING PROBLEMS IN DATA MINING RESEARCH.International Journal of Information Technology & Decision Making,2006.5(4):p.597-604.
    14.Lin,J.H.F.,Discovering unusual and non-trivial patterns in massive time series databases.2005,Dissertation of UNIVERSITY OF CALIFORNIA, RIVERSIDE.
    15.Gaber,M.M.,A.Zaslavsky,and S.Krishnaswamy,Mining data streams:a review.ACM SIGMOD Record,2005.34(2):p.18-26.
    16.金澈清,钱卫宁,and周傲英,流数据分析与管理综述.软件学报,2004.15(8):p.1172-1181.
    17.Henzinger,M.R.,P.Raghavan,and S.Rajagopalan,Computing on data streams.Dimacs Series In Discrete Mathematics And Theoretical Computer Science,1999:p.107-118.
    18.Arasu,A.,et al.,Characterizing memory requirements for queries over continuous data streams.Proceedings of the twenty-first ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems,2002:p.221-232.
    19.Ghanem,T.M.,et al.,Incremental Evaluation of Sliding-Window Queries over Data Streams.IEEE Transactions on Knowledge and Data Engineering,2007.19(1):p.57-72.
    20.Chandrasekaran,S.and M.J.Franklin,Remembrance of streams past:Overload-sensitive management of archived streams.30th VLDB,Sept,2004.
    21.Ananthakrishna,R.,et al.,Efficient approximation of correlated sums on data streams.IEEE Transactions on Knowledge and Data Engineering,2003.15(3):p.569-572.
    22.Wong,A.,et al.,Fast estimation of fractal dimension and correlation integral on stream data.Information Processing Letters,2005.93(2):p.91-97.
    23.Bai,Y.,et al.,RFID Data Processing with a Data Stream Query Language.Data Engineering.ICDE 2007.IEEE 23rd International Conference on,2007:p.1184-1193.
    24.Altiparmak,F.,E.Tuncel,and H.Ferhatosmanoglu,Incremental Maintenance of Online Summaries Over Multiple Streams.IEEE Transactions on Knowledge and Data Engineering,2007.26.
    25.Cormode,G.and S.Muthukrishnan,An improved data stream summary:The count-min sketch and its applications.Journal of Algorithms,2005.55(1):p.58-75.
    26.Mouratidis,K.and D.Papadias,Continuous Nearest Neighbor Queries over Sliding Windows.IEEE Transactions on Knowledge and Data Engineering,2007.19(6):p.789-803.
    27.Aggarwal,C.C.,A framework for diagnosing changes in evolving data streams.Proceedings of the 2003 ACM SIGMOD international conference on Management of data,2003:p.575-586.
    28.Aggarwal,C.C.,et al.,A framework for clustering evolving data streams. Proceedings of the 29th international conference on Very large data bases-Volume 29,2003:p.81-92.
    29.Yu,P.S.and C.C.Aggarwal,A Framework for On-Demand Classification of Evolving Data Streams.
    30.Kargupta,H.and B.H.Park,A Fourier spectrum-based approach to represent decision trees for mining data streams in mobile environments.IEEE Transactions on Knowledge and Data Engineering,2004.16(2):p.216-229.
    31.Kargupta,H.and H.Dutta,Orthogonal Decision Trees.Proceedings of the Fourth IEEE International Conference on Data Mining(ICDM'04)-Volume 00,2004:p.427-430.
    32.Aggarwal,C.,et al.,A Framework for Projected Clustering of High Dimensional Data Streams.Proc.2004 Int.Conf.on Very Large Data Bases,Toronto,Canada,2004.
    33.Dong,G.,et al.,Online Mining of Changes from Data Streams:Research Problems and Preliminary Results,in Proceedings of the 2003 ACM SIGMOD Workshop on Management and Processing of Data Streams.2003.p.359-366.
    34.Beringer,J.and E.H(u|¨)llermeier,Online clustering of parallel data streams.Data & Knowledge Engineering,2006.58(2):p.180-204.
    35.Chong,Z.H.,et al.,Efficient Computation of k-Medians over Data Streams Under Memory Constraints.Journal of Computer Science and Technology,2006.21(2):o.284-296.
    36.曹锋and周傲英,基于图形处理器的数据流快速聚类.软件学报,2007.18(2):p.291-302.
    37.常建龙,曹锋,and周傲英,基于滑动窗口的进化数据流聚类.软件学报,2007.18(4):p.905-918.
    38.杨雪梅,et al.,高维数据流的在线相关性关系.计算机研究与发展,2006.43(10):p.1744-1750.
    39.钱江波,et al.,多数据流滑动窗口并发连接方法.计算机研究与发展,2005.42(10):p.1771-1778.
    40.钱江波,et al.,基于最小生成树的数据流窗口连接优化算法.计算机研究与发展,2007.44(6):p.1000-1007.
    41.王伟平,et al.,基于滑动窗口的数据流连续JA查询的处理方法.软件学报,2006.17(4):p.740-749.
    42.刘勇,李建中,and朱敬华,一种新的基于频繁闭显露模式的图分类方法.计算机研究与发展,2007.44(7):p.1169-1176.
    43.杨春宇and周杰,一种混合属性数据流聚类算法.计算机学报,2007.30(8):p.08.
    44.Yang,C.and J.Zhou,HClustream:A Novel Approach for Clustering Evolving Heterogeneous Data Stream.Proceedings of the Sixth IEEE International Conference on Data Mining-Workshops,2006:p.682-688.
    4s.于戈,et al.,支持截止期敏感应用的数据流任务调度方法软件学报,2007:p.07.
    46.杨颖,韩忠明,and 杨磊,数据流的核心技术与应用发展研究综述.计算机应用研究,2005.22(11):p.4-7.
    47.王金栋,et al.,一类数据流连续查询的降载策略研究.究武汉大学学报:工学版,2005.38(6):p.133-137.
    48.Yue,X.,et al.,Artificial immune system inspired behavior-based anti-spam filter.Soft Computing-A Fusion of Foundations,Methodologies and Applications,2007.11(8):p.729-740.
    49.孙玉芬 and 卢炎生,一种基于网格方法的高维数据流子空间聚类算法.计算机科学,2007.34(4):p.199-203,221.
    50.陈安龙,et al.,挖掘多数据流的异步偶合模式的抗噪声算法.软件学报,2006.17(8):p.1753-1763.
    51.陈安龙,et al.,基于小波和偶合特征的多数据流压缩算法.软件学报,2007.18(2):p.177-184.
    52.宋国杰,et al.,数据流中异常模式的提取与趋势监测.计算机研究与发展,2004.41(10):p.1754-1759.
    53.朱蔚恒,印鉴,and 谢益煌,基于数据流的任意形状聚类算法.软件学报,2006.17(3):p.379-387.
    54.潘云鹤,王金龙,and 徐从富,数据流频繁模式挖掘的研究进展.自动化学报,2006.32(4):p.594-602.
    55.Muthukrishnan,S.,Data streams:Algorithms and applications.Manuscript based on invited talk from 14th SODA,2003.
    56.Gibbons,P.B.and S.Tirthapura,Distributed Streams Algorithms for Sliding Windows.Theory of Computing Systems.
    57.Chaudhuri,S.,R.Motwani,and V.Narasayya,On random sampling over joins.Proceedings of the 1999 ACM SIGMOD international conference on Management of data,1999:p.263-274.
    58.Babcock,B.,M.Datar,and R.Motwani,Sampling from a moving window over streaming data.Proceedings of the thirteenth annual ACM-SIAM symposium on Discrete algorithms,2002:p.633-634.
    59.Hershberger,J.and S.Suri,Adaptive sampling for geometric problems over data streams.Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium on Principles of database systems, 2004:p.252-262.
    60.Frahling,G.,P.Indyk,and C.Sohler,Sampling in dynamic data streams and applications.Proceedings of the twenty-first annual symposium on Computational geometry,2005:p.142-149.
    61.Johnson,T.,S.Muthukrishnan,and I.Rozenbaum,Sampling algorithms in a stream operator.Proceedings of the 2005 ACM SIGMOD international conference on Management of data,2005:p.1-12.
    62.Gibbons,P.B.,Y.Matias,and V.Poosala,Fast incremental maintenance of approximate histograms.ACM Transactions on Database Systems(TODS),2002.27(3):p.261-298.
    63.Ioannidis,Y.E.and V.Poosala,Balancing histogram optimality and practicality for query result size estimation.Proceedings of the 1995 ACM SIGMOD international conference on Management of data,1995:p.233-244.
    64.Guha,S.,N.Koudas,and K.Shim,Data-streams and histograms.Proceedings of the thirty-third annual ACM symposium on Theory of computing,2001:p.471-475.
    65.Faloutsos,C.,M.Ranganathan,and Y.Manolopoulos,Fast subsequence matching in time-series databases.Proceedings of the 1994 ACM SIGMOD international conference on Management of data,1994:p.419-429.
    66.Zhu,Y.and D.Shasha,StatStream:Statistical Monitoring of Thousands of Data Streams in Real Time.Proceedings 2002 VLDB Conference,2002.
    67.Bulut,A.and A.K.Singh,A unified framework for monitoring data streams in real time.21st International Conference on Data Engineering,Tokyo,Japan,2005.
    68.Manku,G.S.and R.Motwani,Approximate Frequency Counts over Data Streams.Proceedings 2002 VLDB Conference,2002.
    69.Dai,B.R.,et al.,Adaptive Clustering for Multiple Evolving Streams.IEEE Transaction On Knowledge and data engineering,2006.18(9).
    70.Alon,N.,Y.Matias,and M.Szegedy,The Space Complexity of Approximating the Frequency Moments.Journal of Computer and System Sciences,1999.58(1):p.137-147.
    71.Babcock,B.,M.Datar,and R.Motwani,Load shedding for aggregation queries over data streams.Data Engineering,Proceedings.20th International Conference on,2004:p.350-361.
    72.Tatbul,N.,et al.,Load Shedding on Data Streams.Proceedings of the Workshop on Management and Processing of Data Streams(MPDS 03),San Diego,CA,USA,June,2003:p.566-576.
    73.秦首科,钱卫宁,and 周傲英,基于分形技术的数据流突变检测算法.软件学报, 2006. 17(9): p. 1969-1979.

    74. Harada, L., Detection of complex temporal patterns over data streams.Information Systems, 2004. 29(6): p. 439-459.

    75. Kontaki, M., A.N. Papadopoulos, and Y. Manolopoulos, Continuous subspace clustering in streaming time series. Information Systems, 2007.

    76. Babcock, B., et al., Models and issues in data stream systems. Proceedings of the twenty-first ACM SIGMOD-SIGACT-S1GART symposium on Principles of database systems, 2002: p. 1-16.

    77. Guha, S., N. Mishra, and R. Motwani, LO'Callaghan. Clustering Data Streams.Proc. IEEE Symposium on Foundations of Computer Science (FOCSa€? 00).

    78. O'Callaghan, L., et al., Streaming-Data Algorithms For High-Quality Clustering, in Proc of IEEE International Conference on Data Engineering.2002. p. 685-704.

    79. Guha, S., et al., Clustering data streams: Theory and practice. IEEE Transactions on Knowledge and Data Engineering, 2003. 15(3): p. 515-528.

    80. Charikar, M., L. Ocallaghan, and R. Panigrahy, Better Streaming Algorithms for Clustering Problems, in Proceedings of the thirty fifth annual ACM symposium on Theory of computing 2003: San Diego, p. 30 - 39

    81. Ordonez, C, Clustering binary data streams with K-means. Proceedings of the 8th ACM SIGMOD workshop on Research issues in data mining and knowledge discovery, 2003: p. 12-19.

    82. Domingos, P. and G. Hulten, Mining high-speed data streams. Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining, 2000: p. 71-80.

    83. Domingos, P. and G. Hulten, A general method for scaling up machine learning algorithms and its application to clustering. Proceedings of the Eighteenth International Conference on Machine Learning, 2001: p. 106-113.

    84. Cao, F., et al., Density-based clustering over an evolving data stream with noise. Proc. of the SIAM Conf. on Data Mining (SDM), 2006.

    85. Babcock, B., et al., Maintaining variance and k-medians over data stream windows. Proceedings of the twenty-second ACM SIGMOD-SIGACT-S1GART symposium on Principles of database systems,2003: p. 234-243.

    86. Nasraoui, O., et al., TEC NO-STREAMS: tracking evolving clusters in noisy data streams with a scalable immune system learning model. Data Mining,2003. ICDM 2003. Third IEEE International Conference on, 2003: p. 235-242.

    87. Hulten, G., L. Spencer, and P. Domingos, Mining time-changing data streams. Proceedings of the seventh ACM SIGKDD international conference on Knowledge discovery and data mining,2001:p.97-106.
    88.Wang,H.,et al.,Mining concept-drifting data streams using ensemble classifiers.Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining,2003:p.226-235.
    89.Gama,J.,R.Rocha,and P.Medas,Accurate decision trees for mining high-speed data streams.Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining,2003:p.523-528.
    90.Kolter,J.Z.and M.A.Maloof,Dynamic weighted majority:a new ensemble method for tracking concept drift.Data Mining,2003.ICDM 2003.Third IEEE International Conference on,2003:p.123-130.
    91.Chi,Y.,et al.,Loadstar:A load shedding scheme for classifying data streams.SIAM International Conference on Data Mining(SDM),2005.
    92.Last,M.,Online classification of nonstationary data streams.Intelligent Data Analysis,2002.6(2):p.129-147.
    93.王鹏,et al.,CAPE——数据流上的基于频繁模式的分类算法.计算机研究与发展,2004.41(10):p.1677-1683.
    94.Ganti,V.,J.Gehrke,and R.Ramakrishnan,Mining data streams under block evolution.ACM SIGKDD Explorations Newsletter,2002.3(2):p.1-10.
    95.Fan,W.,et al.,Active mining of data streams.Proceedings of the Fourth SIAM International Conference on Data Mining,2004:p.457-461.
    96.Ding,Q.,Q.Ding,and W.Perrizo,Decision tree classification of spatial data streams using Peano Count Trees.Proceedings of the 2002 ACM symposium on Applied computing,2002:p.413-417.
    97.Chang,J.H.and W.S.Lee,A Sliding Window Method for Finding Recently Frequent Itemsets over Online Data Streams.Journal of Information Science and Engineering,2004.20(4):p.753-762.
    98.Giannella,C.,et al.,Mining Frequent Patterns in Data Streams at Multiple Time Granularities,in Next Generation Data Mining.2003.p.191-212.
    99.Yu,J.X.,et al.,A false negative approach to mining frequent itemsets from high speed transactional data streams.Information Sciences,2006.176(14):p.1986-2015.
    100.Teng,W.G.,M.S.Chen,and P.S.Yu,A regression-based temporal pattern mining scheme for data streams.Proceedings of the 29th international conference on Very large data bases-Volume 29,2003:p.93-104.
    101.Li,H.F.,S.Y.Lee,and M.K.Shah,An efficient algorithm for mining frequent itemsets over the entire history of data streams.Proc.of the 1st Intl.Workshop on Knowledge Discovery in Data Streams,2004.
    102.张昕,et al.,数据流中一种快速启发式频繁模式挖掘方法.软件学报,2005.16(12):p.2099-2105.
    103.刘学军,et al.,挖掘数据流中的频繁模式.计算机研究与发展,2005.42(12):p.2192-2198.
    104.Jin,X.,et al.,Efficient Discovery of Emerging Frequent Patterns in ArbitraryWindows on Data Streams.Proceedings of the 22th International Conference on Data Engineering,ICDE,2006:p:113.
    105.Karp,R.M.,S.Shenker,and C.H.Papadimitriou,A simple algorithm for finding frequent elements in streams and bags.ACM Transactions on Database Systems(TODS),2003.28(1):p.51-55.
    106.Jin,R.and G.Agrawal,An algorithm for in-core frequent itemset mining on streaming data.Submitted for publication,July,2003.2003.
    107.Elfeky,M.G.,W.G.Aref,and A.K.Elmagarmid,Periodicity detection in time series databases.IEEE Transactions on Knowledge and Data Engineering,2005.17(7):p.875-887.
    108.Zhu,Y.and D.Shasha,Efficient elastic burst detection in data streams.Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining,2003:p.336-345.
    109.Guha,S.,D.Gunopulos,and N.Koudas,Correlating synchronous and asynchronous data streams.Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining,2003:p.529-534.
    110.Kifer,D.,S.Ben-David,and J.Gehrke,Detecting change in data streams.Proceedings of the 30th International Conference on Very Large Data Bases,2004.
    111.Raeth,P.G.and D.A.Bertke,Finding events automatically in continuously sampled data streamsvia anomaly detection.National Aerospace and Electronics Conference,2000.NAECON 2000.Proceedings of the IEEE 2000,2000:p.580-587.
    112.Kuramitsu,K.,Finding Periodic Outliers over a Monogenetic Event Stream.Ubiquitous Data Management,2005.UDM 2005.International Workshop on,2005:p.97-104.
    113.Krishnamurthy,B.,et al.,Sketch-based change detection:methods,evaluation,and applications.Proceedings of the 3rd ACM SIGCOMM conference on Internet measurement,2003:p.234-247.
    114.Burl,M.,et al.,Diamond Eye:A distributed architecture for image data mining.SPIE DMKD,Orlando,April,1999.
    115.Kargupta,H.,et al.,MobiMine:monitoring the stock market from a PDA. ACM SIGKDD Explorations Newsletter,2002.3(2):p.37-46.
    116.Kargupta,H.,et al.,VEDAS:A Mobile and Distributed Data Stream Mining System for Real-Time Vehicle Monitoring.Proceedings of the SIAM International Data Mining Conference,Orlando,2004.
    117.Tanner,S.,et al.,EVE:On-Board Process Planning and Execution.Earth Science Technology Conference,Pasadena,CA,Jun,2002:p.11-14.
    118.Srivastava,A.N.and J.Stroeve,Onboard Detection of Snow,Ice,Clouds and Other Geophysical Processes Using Kernel Methods,in ICML.2003.
    119.Melek,W.W.,et al.,Comparison of Trend Detection Algorithms in the Analysis of Physiological Time-Series Data.Biomedical Engineering,IEEE Transactions on,2005.52(4):p.639-651.
    120.Allen,R.,Time series methods in the monitoring of intracranial pressure Ⅰ:Problems,suggestion for a monitoring scheme and review of appropriate techniques.J.Biomed.Eng,1983(5):p.5-17.
    121.Blom,J.A.,J.F.Ruyter,and N.Saranummi,Detection of trends in monitored variables.Computer and Controls in Clinical Medicine,1985:p.153-174.
    122.Konstantinov,K.B.and T.Yoshida,Real-time qualitative analysis of the temporal shapes of(bio) process variables.AIChE Journal,1992.38(11):p.1703-1715.
    123.Koski,A.,M.Juhola,and M.Meriste,Syntactic recognition of ECG signals by attributed finite automata.Pattern Recognition,1995.28(12):p.1927-1940.
    124.Shatkay,H.and S.B.Zdonik,Approximate queries and representations for large data sequences.Data Engineering,1996.Proceedings of the Twelfth International Conference on,1996:p.536-545.
    125.Keogh,E.,S.Chu,and D.Hart,Segmenting time series:A survey and novel approach,in IEEE Int'l Conf.on Data Mining.2001.p.289-296.
    126.Charbonnier,S.,et al.,Trends extraction and analysis for complex system monitoring and decision support.Engineering Applications of Artificial Intelligence,2005.18(1):p.21-36.
    127.Bakshi,B.R.and G.Stephanopoulos,Representation of process trends—Ⅲ.Multi-scale extraction of trends from process data.Computers and Chemical Engineering,1994.18(4):p.267-302.
    128.Vedam,H.,V.Venkatasubramanian,and M.Bhalodia,A B-spline based method for data compression,process monitoring and diagnosis.Computers and Chemical Engineering,1998.22:p.827-830.
    129.Hunter,J.and N.McLntosh,Knowledge-Based Event Detection in Complex Time Series Data.Artificial Intelligence in Medicine:Joint European Conference on Artificial Intelligence in Medicine and Medical Decision Making,AIMDM'99,Aalborg,Denmark,June 20-24,1999:Proceedings, 1999.
    130.Quan-Gen,Z.and W.R.Cluett,Recursive Identification of Time-varying Systems via Incremental Estimation.Automatica,1996.32(10):p.1427-1431.
    131.Simon,H.,ed.Adaptive filter theory(Fourth Edition).2002,Prentice Hall.
    132.Hawkins,D.M.,Fitting multiple change-point models to data.Computational Statistics and Data Analysis,2001.37(3):p.323-341.
    133.Dong,G.,et al.,Online Mining of Changes from Data Streams:Research Problems and Preliminary Results.Proceedings of the 2003 ACM SIGMOD Workshop on Management and Processing of Data Streams,2003.
    134.Tax,D.M.J.and R.P.W.Duin,Support Vector Data Description.Machine Learning,2004.54(1):p.45-66.
    135.Tax,D.M.J.,One-class classification.Unpublished doctoral/dissertation,Delft University of Technology,2001.
    136.Tax,D.M.J.and R.P.W.Duin,Support vector domain description.Pattern Recognition Letters,1999.20(11-13):p.1191-1199.
    137.Tax,D.M.J.,A.Ypma,and R.P.W.Duin,Pump Failure Detection Using Support Vector Data Descriptions.Advances in Intelligent Data Analysis:Third International Symposium,Ida-99,Amsterdam,the Netherlands,August 9-11,1999:Proceedings,1999.
    138.Ypma,A.,D.M.J.Tax,and R.P.W.Duin,Robust machine fault detection with independent component analysisand support vector data description.Neural Networks for Signal Processing Ⅸ,1999.Proceedings of the 1999 IEEE Signal Processing Society Workshop,1999:p.67-76.
    139.李凌均,张周锁,and 何正嘉,基于支持向量数据描述的机械故障诊断研究.西安交通大学学报,2003.37(009):p.910-913.
    140.肖健华,人脸确认的动态支持向量数据描述方法.中国图象图形学报,2006.11(001):p.19-25.
    141.Cauwenberghs,G.and T.Poggio,Incremental and decremental support vector machine learning.Advances in Neural Information Processing Systems,2001.13:p.409-415.
    142.Herbrich,R.,Learning Kernel Classifiers:Theory and Algorithms.2002:Mit Press.
    143.Abonyi,J.,et al.,Modified gath-geva clustering for fuzzy segmentation of multivariate time-series.Fuzzy Sets and Systems,2005.149:p.39-56.
    144.Ben-David,S.,J.Gehrke,and D.Kifer,Detecting Change in Data Streams.Proc.2004 VLDB Conference,2004.
    145.Knorr,E.M.and R.T.Ng,A unified approach for mining outliers.Proceedings of the 1997 conference of the Centre for Advanced Studies on Collaborative research,1997.
    146.Cai,Y.D.,et al.,MAIDS:mining alarming incidents from data streams.Proceedings of the 2004 ACM SIGMOD international conference on Management of data,2004:p.919-920.
    147.Qi,H.and J.Wang,A model for mining outliers from complex data sets.Proceedings of the 2004 ACM symposium on Applied computing,2004:p.595-599.
    148.Arning,A.,R.Agrawal,and P.Raghavan,A linear method for deviation detection in large databases.Proc.KDD,1996:p.164-169.
    149.Aggarwal,C.C.and P.S.Yu,Outlier detection for high dimensional data.Proceedings of the 2001 ACM SIGMOD international conference on Management of data,2001:p.37-46.
    150.Knott,E.M.,R.T.Ng,and V.Tucakov,Distance-based outliers:algorithms and applications.The VLDB Journal The International Journal on Very Large Data Bases,2000.8(3):p.237-253.
    151.Papadimitriou,S.,A.Brockwell,and C.Faloutsos,Adaptive,hands-off stream mining.Proceedings of the 29th international conference on Very large data bases-Volume 29,2003:p.560-571.
    152.Yu,D.,G.Sheikholeslami,and A.Zhang,FindOut:Finding Outliers in Very Large Datasets.Knowledge and Information Systems,2002.4(4):p.387-412.
    153.Breunig,M.M.,et al.,LOF:identifying density-based local outliers.ACM SIGMOD Record,2000.29(2):p.93-104.
    154.Papadimitirou,S.,et al.LOCI:Fast outlier de tection using the local correlation integral.in In Proc.Of the 19th Int'l Conf.Data Engineering.2003.Bangalore.
    155.王宏鼎,et al.,异常点挖掘研究进展.智能系统学报,2006.1(1):p.67-73.
    156.Muthukrishnan,S.,R.Shah,and J.S.Vitter,Mining deviants in time series data streams.Scientific and Statistical Database Management,2004.Proceedings.16th International Conference on,2004:p.41-50.
    157.Ma,J.and S.Perkins,Time-series novelty detection using one-class support vector machines.Proceedings of the International Joint Conference on Neural Networks,2003.3:p.1741-1745.
    158.Davy,M.,et al.,An online support vector machine for abnormal events detection.Signal Processing,2006.86(8):p.2009-2025.
    159.Jordaan,E.M.and G.F.Smits,Robust outlier detection using SVM regression.Neural Networks,2004.Proceedings.2004 IEEE International Joint Conference on,2004.
    160.Ma,J.and S.Perkins,Online novelty detection on temporal sequences. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining,2003:p.613-618.
    161.Wang,J.Q.,et al.,SVM-OD:A New SVM Algorithm for Outlier Detection,in Foundations and New Directions of Data Mining Workshop in IEEE International Conference of Data Mining.2003.p.129-141.
    162.Vapnik,V.,The Nature of Statistical Learning(统计学习理论的本质)张学工,译(Zhang Xuegong).2000,Bejing:Tsinghua University Press (北京:清华大学出版社).
    163.Suykens,J.A.K.and J.Vandewalle,Least Squares Support Vector Machine Classifiers.Neural Processing Letters,1999.9(3):p.293-300.
    164.Suykens,J.A.K.,et al.,Weighted least squares support vector machines:robustness and sparse approximation.Neurocomputing,2002.48(1):p.85-105.
    165.Suykens,J.A.K.and J.Vandewaile,Recurrent least squares support vector machines.Circuits and Systems Ⅰ:Fundamental Theory and Applications,IEEE Transactions on[see also Circuits and Systems Ⅰ:Regular Papers,IEEE Transactions on],2000.47(7):p.1109-1114.
    166.Tang,H.S.,et al.,Online weighted LS-SVM for hysteretic structural system identification.Engineering Structures,2006.28(12):p.1728-1735.
    167.范玉刚,李平,and 宋执环,动态加权最小二乘支持向量机.控制与决策,2006.21(10):p.1129-1133.
    168.黄有度,狄成恩,and 朱士信,矩阵论及其应用.1995,合肥:中国科学技术大学出版社.
    169.Cawley,G.C.and N.L.C.Talbot,Fast exact leave-one-out cross-validation of sparse least-squares support vector machines.Neural Networks,2004.17(10):p.1467-1475.
    170.Basu,S.and M.Meckesheimer,Automatic outlier detection for time series:an application to sensor data.Knowledge and Information Systems,2007.11(2):p.137-154.
    171.O'Callaghan,L.,N.Mishra,and A.Meyerson,Streaming-Data Algorithms for High-Quality Clustering Intern.Conf.on Data Engineering,2002.
    172.Dai,B.R.,et al.,Adaptive Clustering for Multiple Evolving Streams.IEEE Transaction On Knowledge and data engineering,2006.18(9):p.1166-1180.
    173.Papadimitriou,S.,A.Brockwell,and C.Faloutsos,Adaptive,unsupervised stream mining.The VLDB Journal The International Journal on Very Large Data Bases,2004.13(3):p.222-239.
    174.Gilbert,A.C.,et al.,One-pass wavelet decompositions of data streams.IEEE Transactions on Knowledge and Data Engineering,2003.15(3):p.541-554.
    175.陈安龙,唐常杰,元昌安,彭京,胡建军,挖掘多数据流的异步偶合模式的抗噪声算法.软件学报,2006.17(8):p.1753-1763.
    176.Chan,F.K.P.,A.W.C.Fu,and C.Yu,Haar wavelets for efficient similarity search of time-series:with and without time warping.IEEE Transactions on Knowledge and Data Engineering,2003.15(3):p.686-705.
    177.曹广畴,现代板坯连铸.1994,北京:冶金工业出版社.
    178.熊毅刚,板坯连铸.北京:冶金工业出版社,1994.
    179.Kano,M.and Y.Nakagawa,Data-based process monitoring,process control,and quality improvement:Recent developments and applications in steel industry.Computers and Chemical Engineering,2008.32(1-2):p.12-24.
    180.Zhang,Y.and M.S.Dudzic,Online monitoring of steel casting processes using multivariate statistical technologies:From continuous to transitional operations.Journal of Process Control,2006.16(8):p.819-829.
    181.李文兵,一种组合漏钢预报模型.钢铁,2004.39(z1):p.538-541.