一种新型的基于Levenshtein距离层次聚类的时序操作优化方法

英文篇名：New operation optimization method with time series based on Levenshtein distance hierarchical clustering
作者：朱坚 ; 杨博 ; 王永健 ; 唐晓婕 ; 李宏光
英文作者：ZHU Jian;YANG Bo;WANG Yongjian;TANG Xiaojie;LI Hongguang;College of Information Science & Technology,Beijing University of Chemical Technology;
关键词：时间序列 ; Levenshtein距离 ; 层次聚类 ; 操作优化 ; 精馏
英文关键词：time series;;Levenshtein distance;;hierarchical clustering;;operational optimization;;distillation
中文刊名：HGSZ
英文刊名：CIESC Journal
机构：北京化工大学信息科学与技术学院;
出版日期：2018-12-04 17:27
出版单位：化工学报
年：2019
期：v.70
语种：中文;
页：HGSZ201902020
页数：9
CN：02
ISSN：11-1946/TQ
分类号：161-169

摘要

现代流程工业过程中,DCS采集并存储了大量的操作时序数据,若能将其中有价值的操作经验和操作信息提取出来,则可大大提高操作系统的性能。然而,操作经验概念较为模糊,无法具体量化。因此,将具有时序特征的操作数据符号化,使操作经验以区块化形式表示,并提出一种基于Levenshtein距离的时序层次凝聚聚类算法,通过对操纵变量的历史时序操作数据进行相似性搜索,进而获得多种相似的操作模式,并将每种类型的操作模式对应的过程变量进行性能分析,从而得到并保存实际工作过程中所需的操作经验,以达到生产过程操作优化的目的。为了验证所提出方法,将其用于连续组分精馏操作过程,实验结果表明所提出的基于Levenshtein距离层次聚类的操作优化方法的有效性。
In the modern process industry process, DCS collects and stores a large amount of operational temporal data. If valuable operational experience and operational information can be extracted, the performance of theoperating system can be greatly improved. However, operational experience is vague and cannot be quantified byvalue. Therefore, the operational data with time series is symbolized so that the operational experience isrepresented in a block form. And we propose a hierarchical clustering algorithm based on Levenshtein distancefor time series. By clustering of historical operational data in the time series of variables, a variety of similaroperating modes are obtained, and the process variables corresponding to the type of operation mode performperformance analysis to obtain and preserve the operational experience required in the actual work process,thereby guiding the process operation of production. In order to verify the proposed method, it is applied to thecontinuous multi component distillation operation process. The results show the effectiveness of the proposed method.

引文

[1] Piatetsky-Shapiro G. The data-mining industry coming of age[J].IEEE Intelligent Systems, 1999, 14(6):32-34.
    [2] Rossiter J A, Kouvaritakis B. Modelling and implicit modelling for predictive control[J]. International Journal of Control, 2001,(11):1085-1095.
    [3] Favoreel W, De Moor B, Van Overschee P. Subspace state space system identification for industrial processes[J]. Journal of Process Control, 2000,(2):149-155.
    [4] Braha D, Shmilovici A. Data mining source code for improving a cleaning process in the semiconductor industry[J]. IEEE Transactions on Semiconductor Manufacturing, 2002, 15(1):91-101.
    [5] Dong L X, Xiao D M, Liu Y L. Rough set and radial basis function neural network based insulation data mining fault diagnosis for power transformer[J]. Journal of Harbin Institute of Technology,2007, 14(2):263-26.
    [6] Yang Q, Wang X. Challenging problems in data mining research[J]. Int. J. of Information Technology and Decision Making, 2006, 5(4):597-604.
    [7] Agrawal R, Psaila G, Wimmers E, et al. Querying shapes of histories[C]//Proceeding of the 21st Int′l Conf. on Very Large Database(VLDB′95). San Francisco:Morgan Kaufmann Publishers, 1995:502-514.
    [8] Keogh E, Lin J. Clustering of time-series subsequences is meaningless:implications for previous and future research[J].Knowledge and Information Systems, 2005, 8(2):154-177.
    [9] Berndt D J, James C. Using dynamic time warping to find patterns time series[C]//Proceedings of the AAAI-94 Workshop on Knowledge Discovery in Databases, Seattle, Washington:KDD workshop, 1994:359-370.
    [10] Wang H, Su H, Zheng K, et al. An effectiveness study on trajectory similarity measures[C]//Proceeding of the 24th Australasian Database Conf.. Darlinghurst:Australia Computer Society, 2013:13-22.
    [11] Akatsukaa S, Nodab M. Similarity analysis of sequential alarms in plant operation data by using Levenshtein distance[C]//Proceedings of the 6th International Conference on Process Systems Engineering(PSE ASIA). Kagaku:Kagaku Ronbunshu,2013:25-27.
    [12] Lin J, Keogh E, Lonardi S, et al. A symbolic representation of time series, with implications for streaming algrithms[C]//Proceedings of the ACM SIGMOD Workshop on Research Issues in Data Mining and Knowledge Discovery. USA:ACM, 2003:2-11.
    [13] Keogh E, Chakrabarti K, Pazzani M, et al. Dimensionality reduction for fast similarity search in large time series databases[J]. Knowl. Inf. Syst., 2001, 3(3):263-286.
    [14] Chakrabarti K, Keogh E E, Mehrotra S, et al. Locally adaptive dimensionality reduction for indexing large time series databases[J]. ACM Trans.Database Syst., 2002,(27):188-228.
    [15] Goldin D Q, Kanellakis P C. On similarity queries for time series data:constraint specification and implementation[M]//International Conference on Principles and Practice of Constraint Programming. Berlin:Springer Press, 1995:137-153.
    [16] Tan S C, San Lau P, Yu X W.Finding similar time series in sales transaction data[C]//International Conference on Industrial,Engineering and Other Applications of Applied Intelligent Systems.Berlin:Springer International, 2015:645-654.
    [17] Loh W K, Kim S W, Whang K Y. A subsequence matching algorithm that supports normalization transform in time series databases[J]. Data Mining and Knowledge Discovery, 2004, 9(1):5-28.
    [18] Berndt D J, Clifford J. Using dynamic time warping to find patterns in time series[M]//KDD Workshop. Washington:KDD Press, 1994:359-370.
    [19] Fu A W C, Keogh E, Lau L Y, et al. Scaling and time warping in time series querying[J]. The International Journal on Very Large Data Bases, 2008, 17(4):899-921.
    [20] Bank?Z, Abonyi J. Correlation based dynamic time warping of multivariate time series[J]. Expert Systems with Applications,2012, 39(17):12814-12823.
    [21]戴东波,汤春蕾,熊赟.基于整体和局部相似性的序列聚类算法[J].软件学报, 2010, 21(4):702-717.DAI D B, TANG C L, XIONG Y. Sequence clustering algorithm based on global and local similarity[J]. Journal of Software, 2010,21(4):702-717.
    [22] Levenshtein V. Binary codes capable of correcting deletions,insertions, and reversals[J]. Soviet Physics Doklady, 1966, 10(8):707-710.
    [23] ILIOPOULOS C S, RAHMAN M S. New efficient algorithms for the LCS and constrained LCS problems[J]. Information Processing Letters, 2008, 106(1):13-18.
    [24] Wagner R A, Fischer M J. The string-to-string correction problem[J]. Journal of the ACM, 1974, 21(1):168-173.
    [25] Silva J D A, Hruschka E R. Extending k-means-based algorithms for evolving data streams with variable number of clusters[C]//International Conference on Machine Learning and Applications and Workshops(ICMLA). Hawaii:IEEE, 2011, 2:14-19.
    [26]王勇,唐靖,饶勤菲,等.高效率的K-means最佳聚类数确定算法[J].计算机应用, 2014, 34(5):1331-1335.Wang Y, Tang J, Rao Q F, et al. High efficiency K-means optimal cluster number determination algorithm[J]. Journal of Computer Applications, 2014, 34(5):1331-1335.
    [27] Celebi M E, Kingravi H A, Vela P A. A comparative study of efficient initialization methods for the k-means clustering algorithm[J]. Expert Systems with Applications, 2013, 40(1):200-210.
    [28] Han J, Kamber M. Data Mining:Concepts and Techniques[M].San Francisco:Morgan Kaufmann, 2001.
    [29] Narasimhan M, Jojic N, Bilmes J. Q-clustering[J]. Neural Information Processing Systems, 2005, 17:1537-1544.
    [30] Li J F, Li J S, He H Q. A simple and accurate approach to hierarchical clustering[J]. Journal of Computational Information Systems, 2011, 7(7):2577-2584.
    [31] Xu D, Tian Y. A comprehensive survey of clustering algrithms[J].Ann. Data Sci., 2015, 2(2):165-193.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700