基于聚类的多模型建模及其在软测量中的应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
多模型可以显著改善模型估计精度,提高模型泛化性能。本文结合实际工程应用背景,对双酚A生产过程中的结晶塔单元进行多模型软测量建模,以实现各种过程变量的在线监测。
     在众多的多模型建模方法中,基于聚类的多模型建模方法受到了广泛的青睐。但是,聚类算法中关于如何初始化聚类个数、聚类中心等问题长期以来一直都没有得到很好的解决,它间接制约了多模型的发展。同时,大多数聚类算法不具有鲁棒性,当样本数据集中存在异常样本点时,聚类效果大打折扣。此外,聚类算法自身也存在着一定的缺陷:在对样本数据集进行聚类的过程中,它仅利用了样本数据的输入集,而忽略了输出集对最终聚类结果的巨大影响,这在一定程度上影响了聚类结果的有效性。最后,作为多模型最重要的部分,子模型的好坏直接关系到最终的多模型精度。针对以上问题,本文从以下方面着手对聚类算法和子模型建模方法进行改进,建立有效的多模型软测量系统:
     1、鉴于传统聚类方法严重依赖于样本数据先验知识和初始参数的固有缺点,提出一种适用于任意形状样本分布的单参数调节扩张搜索聚类算法。该方法以近邻算法为基础,定义各样本的ε-邻域,通过扩张搜索的方法将所有相关联的ε-邻域样本归为一类,从而聚类样本数据。将其用于聚类样本数据,得到基于扩张搜索聚类的多模型建模方法。
     2、为抑制异常样本点对聚类结果的影响,提出一种基于局部重构融合流形聚类的多模型软测量建模方法。该方法将样本集拆分为多个互不相交的样本子簇,克服异常样本点对聚类结果的影响;以各样本子簇重构线性流形面,融合属于同一流形面且相距较近的样本子簇,得到多个子类;采用支持向量机为各个子类样本建立回归子模型,得到软测量多模型。
     3、针对传统聚类算法在处理不完备信息时存在的不足,提出一种基于二次数据划分的多模型建模方法。该方法对聚类得到的样本子簇利用改进的粗糙集分类器进行二次数据划分,在一定程度上消除矛盾样本点可能对模型精度造成的影响。对得到的各个子类利用支持向量机建立回归子模型,得到多模型软测量系统。同时,鉴于分类过程中可能由于样本分布不均而出现不平衡分类问题,采用改进的加权粗糙集分类器对上述算法作进一步的改进,提高分类器的精度,确保了多模型的有效性。
     4、子模型的效果直接影响着最终的多模型精度,提出一种局部惩罚加权核偏最小二乘算法。该方法通过核映射将原始输入映射到高维特征空间实现非线性问题的线性化处理,通过偏最小二乘算法进行主成分提取,降低数据维数;对由主成分构成的新数据集,依据局部学习思想构建局部惩罚加权最小二乘回归模型,有区别的对待各样本的贡献值,在一定程度上抑制异常样本点的影响,优化模型参数。鉴于多模型可以改善模型估计精度,提高泛化性,采用扩张搜索聚类算法聚类样本集,对得到的聚类子簇依据上述算法建立回归子模型,得到多模型软测量系统。
     在双酚A生产过程质量指标的软测量建模仿真中验证了上述各方法的有效性。
In view of multiple models can significantly improve model’s estimation accuracy and generalization performance, combining with actual industry application background, it is used to construct multiple models of crystal tower and realize online monitoring of process variables.
     In sorts of multiple modeling method, clustering-based ones get the most widespread concern. However, in the traditional clustering algorithm, many problems, such as how to decide cluster numbers and centers, are still unsolved. And it indirectly constrains the development of multiple models. At the same time, most clustering algorithms are sensitive to abnormal sample points what greatly reduces the effectiveness of clustering results. Moreover, traditional clustering algorithm has inherent shortcomings which only uses the input sets, while ignores the enormous impact of output set in the process of clustering sample set results in determining the quality of clustering. At last, as the most important part of the multi-model, sub-model will have a direct bearing on the accuracy of multiple modes.
     To solve the problems mentioned above, the paper improves clustering algorithm and modeling approach from the following four aspects so as to establish effective multiple models.
     In view of the traditional cluster algorithm’s shortcomings of heavily relying on the priori knowledge and initial parameters, a single-parameter adjustment expanding search clustering method which is suitable for arbitrary shape sample distribution is presented. The new clustering method is based on the nearest neighbor algorithm. By definingε-neighborhood of samples and applying expanding search method, the algorithm classifies all associatedε-neighborhood samples into one cluster, and therefore, the work of clustering sample set is achieved. The proposed algorithm is used for clustering sample set, and obtains multiple modeling techniques based on expanding search clustering.
     The existing of outliers will severely affect clustering results. A multiple modeling method based on manifold clustering with local reconstruction and merging is proposed. In order to restraining the impacts of outliers to clustering results, data set is split into several small disjoint sub-clusters. By reconstructing linear manifold level based on every sub-cluster respectively, it completes the work of clustering through merging sub-clusters who are not only closer but also belonging to the same manifold level. Meanwhile, Support Vector Machine is used to construct regression model in each sub-class and multiple models is obtained finally.
     The traditional clustering algorithm can’t deal with incomplete information very well. A multiple modeling approach based on secondary data partition is presented. The proposed method carries out the secondary classification on the sub-class by improved rough set classifier which obtains from clustering sample set, so as to eliminate affect of contraction samples on model’s accuracy to some extent. Support vector machine is used for building regression sub-model on each subclass, and finally obtain the soft-sensing multiple models. At the same time, in view of possible appearance of unbalanced classification problem, the improved weighted rough set classifier is adopt to improve above multiple modeling method further more so that significantly boosts the classification accuracy of classifier and ensures the reliability of multiple models.
     The accuracy of final multiple models directly depends on effect of sub-models. A novel local penalized weighted kernel partial least squares algorithm is presented. The proposed method map original inputs into a high dimensional feature space so as to realize the linear treatment of nonlinear problems, and meanwhile, partial least squares algorithm is used to extract the principal components. According to local learning theory, a local penalized weighted least squares regression model is constructed based on the new data set, which is formed by the principal component, in order to differentially treat the contribution of each sample value, reduce the model sensitivity of abnormal data and optimize the model parameters. In view of multiple models can improve the estimated accuracy and generalization of model, the expanding search clustering algorithm and local penalized weighted kernel partial least squares are used to cluster sample set and establish the regression sub-models on corresponding sub-cluster respectively. Finally, a soft sensor system based on multiple models is obtained.
     The proposed algorithms are used in a soft sensor model for the Bisphenol-A productive process, and the result of simulation shows the effectiveness of the algorithm.
引文
1.仲蔚,刘爱伦.多变量系统的软测量建模研究[J].控制与决策, 2000, 15(2): 209-212.
    2.吴文元,熊智华,吕宁.支持向量回归在乙烯裂解产物收率软测量中的应用[J].化工学报, 2010, 61(8): 2046-2050.
    3.韩力群.人工神经网络理论、设计与应用[M].北京:化学工业出版社, 2002.
    4.俞金寿.软测量技术及其应用[J].自动化仪表, 2008, 29(1): 1-6.
    5. Kadlec P, Gabrys B, Strandt S. Data-driven soft sensors in the process industry[J]. Computers and Chemical Engineering, 2009, 33(1): 795-814.
    6.陈进东,潘丰.青霉素发酵过程中的混合建模[J].化工学报, 2010, 61(8): 2092-2096.
    7. Cho S B, Kim J H. Combining multiple neuralnetworks by fuzzy integral for robust classification[J]. IEEE Trans on Systems, Man and Cybernetics, 1995, 25(2): 380-384.
    8.高林,顾幸生.神经网络多模型软测量技术及应用[J].华东理工大学学报, 2004, 30(5): 559-563.
    9.仲蔚,俞金寿.基于模糊c均值聚类的多模型软测量建模[J].华东理工大学学报, 2000, 26(1): 83-87.
    10.李炜,刘全银,王凯东.基于混合粒子群优化的异类多模型软测量方法研究与应用[J].化工自动化及仪表, 2009, 36(2): 6-10.
    11.李修亮,苏宏业,褚健.基于在线聚类的多模型软测量建模方法[J].化工学报, 2007, 58(11): 2834-2839.
    12.李卫,杨煜普,王娜.基于核模糊聚类的多模型LSSVM回归建模[J].控制与决策, 2008, 23(5): 560-563.
    13.姚崇龄,张阿卜.基于神经模糊系统的多模型建模方法及其在软测量中的应用[D]: [硕士学位论文].厦门:厦门大学, 2007.
    14.王介生.烧结过程软测量建模综述[J].烧结球团, 2007, 32(4): 31-36.
    15.陈云,吕翠英.正交试验设计在粗汽油干点多辅助变量选择中的应用[J].石油学报, 2004, 20(6): 46-50.
    16.罗荣富,邵惠鹤.推断控制中二次变量选择方法的研究[C].见:控制与决策年会,哈尔滨, 1992: 280-282.
    17.张正江,祝铃钰,邵之江.基于大规模严格机理模型的数据校正[J].高校化工工程学报, 2008, 22(5): 877-882.
    18. Liebman M J, Edgar T F. Data reconciliation for nonlinear process[C]. In: Proceedings ofthe AIChE Annual Meeting, Washington, DC, 1998: 203-210.
    19.魏可泰.多模型软测量理论研究及其在甲醛生产中的应用[D]: [硕士学位论文].青岛:青岛科技大学, 2008.
    20.王书舟,伞冶.支持向量机的训练算法综述[J].智能系统学报, 2008, 3(6): 467-475.
    21.杜树新,吴铁军.用于回归估计的支持向量机方法[J].系统仿真学报, 2003, 15(11): 1580-1586.
    22. Wold H. Soft modeling by latent variables: the non-linear iterative partial least squares(NIPALS) approach[J]. Multivariate Analysis, 1973: 383-407.
    23. Jong S D. SIMPLS: An alternative approach to partial least squares regression[J]. Chemometrics and Intelligent Laboratory Systems, 1993, 18(3): 251-263.
    24. Mika S L. On fuzzy clustering based regression models[C]. In: Processing NAFIPS’04, 2004: 216-221.
    25. Kanungo T, Mount D M, Netanyahu N S. An Efficient k-Means Clustering Algorithm: Analysis and Implementation[J]. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2002, 24(7): 881-892.
    26.郭庆胜,郑春燕,胡华科.基于邻近图的点群层次聚类方法的研究[J].测绘学报, 2008, 37(2): 256-261.
    27.刘必红,符红光.快速发现任意形状的聚类[J].计算机应用, 2002, 22(4): 22-24.
    28.罗健旭,邵惠鹤.软测量建模数据的过失误差侦破—一种基于聚类分析的方法[J].仪器仪表学报, 2005, 26(3): 238-241.
    29. Roweis S T, Saul L K. Nonlinear Dimensionality Reduction by Locally Linear Embedding[J]. Science, 2000, 290(5500): 2268-2269.
    30. Tenenbaum J B, Silva V D, Langford J C. A Global Geometric Framework for Nonlinear Dimensionality Reduction[J]. Science, 2000, 290(5500): 2319-2323.
    31. Tsitsiashvili G SH, Losev A S. Application of the Floyd algorithm to the asymptotic analysis of networks with unreliable ribs[J]. Automation and Remote Control, 2008, 69(7): 1262-1265.
    32.阎昊,樊兴,夏学知.图结构与Dijkstra算法在无人机航迹规划中的应用[J].火力与指挥控制, 2010, 35(4): 155-160.
    33. Zhou D Y, Bousquet O, Lal T A. Learning with Local and Global Consistency[C]. In: Proceeding of NIPS 2004, 2004: 321-328.
    34. Pawlak Z. Rough sets[J]. International Journal of Information and Computer Science, 1982, 11(5): 341-356.
    35. Ma T H, Tang M L. Weighted rough set model[C]. In: Proceedings - ISDA 2006: SixthInternational Conference on Intelligent Systems Design and Applications, 2006: 481-485.
    36. Liu J F, Hu Q H, Yu D R. A comparative study on rough set based class imbalance learning[J]. Knowledge-Based Systems, 2008, 21(8): 753-763.
    37. Freund Y, Schapire R E. A decision-theoretic generalization of on-line learning and an application to boosting[J]. Journal of Computer and System Sciences, 1997, 55(1):119-139.
    38. Pawlak Z, Ziarko W. Rough sets[J]. Probabilistic versus deterministic approach, International Journal of Machine Studies, 1988, 29: 81-95.
    39.周美立.相似工程学[M].北京:机械工业出版社, 1998.
    40. Liu H, Setiono R. Chi2: feature selection and discretization of numeric attributes[C]. In: Tools with Artificial Intelligence, Seventh International Conference, 1995: 388-391.
    41. Hu Q H, Zhao H, Xie Z X. Consistency based attributes reduction[C]. In: Knowledge Discovery and Data Mining - 11th Pacific-Asia Conference, 2007: 96-107.
    42. Ahmed F, Nazir S, Yeo Y K. A recursive PLS-based soft sensor for prediction of the melt index during grade change operations in HDPE plant [J]. Korean Journal of Chemical Engineering, 2009, 26(1): 14-20.
    43. Rosipal R, Trejo L J, Matthews B. Kernel PLS-SVC for linear and nonlinear classification[C]. In: 20th International Conference on Machine Learning, 2003: 112-120.
    44. Aitken A C. On the least squares and linear combinations of observations[C]. In: Procee- dings of the Royal Society of Edinburgh, 1935: 42-48.
    45. Bennett K P. Embrechts M J. An optimization perspective on kernel partial least squares, In Advances in Learning Theory: Methods, Models and Applications[J]. NATO Science SeriesⅢ: Computer & Systems Sciences, 2003, 190(5): 227-250.
    46. Zhou Y P, Cai C B, Jiang J H. QSAR study of angiotensin II antagonists using robust boosting partial least squares regression[J]. Analytica Chimica Acta, 2007, 593(1): 68-74.
    47. Debruyne M, Christmann A, Hubert M. Robustness of reweighted Least Squares Kernel Based Regression[J]. Journal of Multivariate Analysis, 2010, 101(2): 447-463.
    48. Guerrero V M. Time series smoothing by penalized least squares[J]. Statistics & Probability Letters, 2007, 77(12): 1225-1234.
    49. Bottou L, Vapnik V. Local learning algorithms[J]. Neural Computation, 1992, 4(6): 888-900.
    50. Feng R, Zhang Y J, Zhang Y Z. Drifting modeling method using weighted support vector machines with application to soft sensor[J]. Acta Automatica Sinica, 2004, 30(3): 436-441.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700