偏最小二乘回归法非线性建模及其递推算法的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
偏最小二乘回归算法通过在数据矩阵中提取正交成分建立回归模型,有效解决了变量间多重相关、样本量明显少于变量、缺失数据等难题,而这些问题会导致经典最小二乘回归算法失效,经过几十年的发展,偏最小二乘广泛应用于化学、化工、经济、环境、食品等领域。
     本文主要研究内容和研究成果包括两个部分:第一部分是研究近期偏最小二乘非线性建模算法。文中重点介绍了INLR(Implicit nonlinear latents variable regression)非线性建模算法,这种方法易实施且预测能力较强,尤其对存在多项式关系的系统有较好的拟合能力,但是INLR非线性建模算法是在数据矩阵中扩充非线性项(不包括原始变量的交叉项),这样系统中隐含成分的非线性项和交叉项,然后对扩充后的矩阵进行建模。可是,INLR非线性建模算法在扩充非线性项的同时可能也包含了与因变量无关的信息,这样算法会通过提取较多的成分来建模,而成分越多,模型的解释能力就越差。为了解决上述问题,本文通过正交投影算法-OPLS(Orthogonal projections to latent structure)对INLR非线性建模算法进行修正,改进后的算法称为OPLS-INLR,并通过模拟实验验证改进后的OPLS-INLR算法在保持预测能力不变的条件下,大大减少成分个数,提高模型的解释能力。第二部分是研究偏最小二乘递推算法,经典的递推算法研究的是因变量是连续变量的情况,本文重点研究因变量是分类变量的递推算法,提出基于KL-PLS(kernel logistic partial least square)的非线性递推算法,通过对判别红酒质量实例数据进行分析,分析结果表明KL-PLS非线性递推算法优于普通logistic递推算法、偏最小二乘logistic递推算法及KL-PLS算法。
Partial least squares regression algorithm establishes regression models by extracting orthogonal components. PLS can efficiently deal with problem of many variables and collinearity between the independent variables, moderate amounts of missing data, low observation/variable ratio, which the ordinary least squares can not solve. After several decades of development, PLS is widely used in chemistry, chemical engineering, economics, environment, food and other fields.
     This paper mainly includes two parts contents and researches. The first part is about study of PLS nonlinear regression algorithm. In this part, INLR(Implicit Nonlinear variable regression) algorithm is introduced for its simple and having good predictive ability when system presence polynomial relations. INLR algorithm extend X with the quadratic or cubic terms and cross terms of components are implicitly included in the model of extended X, then we can implement the PLS algorithm to establish model, but the extended X may includes non-correlated variation. INLR algorithm will extract more components as a result of non-correlated variation to y. Interpretations of model will get bad, when the number of component is increasing. The paper adopt OPLS(orthogonal projections to latent structure) to solve that problem. OPLS method analyzes the variation explained in PLS component and removes non-correlated variation in X. The improved algorithm is called OPLS-INLR. The advantages of OPLS-INLR algorithm are proved by simulation experiment. The result illustrates OPLS-INLR retain the ability prediction of INLR and improve the interpretation of model. The two part of this paper is PLS recursive algorithm study. Classical PLS recursive algorithm study continuous dependent variables, this paper mainly research recursive algorithm of category dependent variable. The based on KL-PLS(kernel logistic partial least square) non-linear recursive algorithm is proposed. The predictive accuracy of KL-PLS nonlinear recursive algorithm is superior to logistic recursive algorithm, PLS logistic recursive algorithm and KL-PLS algorithm through analysis of red wine quality data.
引文
[1]王惠文,吴载斌,孟洁.偏最小二乘回归的线性与非线性方法[M].北京:国防工业出版社, 2006
    [2] Wold S, Kettaneh-Wold N, Skagerberg B. Nonlinear PLS modeling [J]. Chemometrics and Intelligent Laboratory Systems, 1989, 7:53-56
    [3] Berglund A, Wold S. Implicit nonlinear latent variable regression[J]. Chemometrics, 1997, 11: 141-156
    [4] Wold S. Nonlinear partial least squares modelling II. Spline inner relation[J]. Chemometrics and Intelligent Laboratory Systems, 1992, 14: 71-84
    [5] Helland K, Berntsen O. S, Martens. H. Recersive algorithm for partial least square regression[J]. Chemometrics Intell. Lab. Syst, 1992, 14: 129-137
    [6] Qin S. J. Recursive PLS algorithms for adaptive data modeling[J]. Computer chem. Engng, 1998, 22, 503-514
    [7] Dayal B. S, MacGregor J. F. Recursive exponentially weighted PLS and its application to control and prediction[J]. Proc. Cont, 1997, 7(3): 169-179
    [8] Qin S. J. A recursive PLS algorithm for system identification. St, Louis: AIChE Annual Meeting, 1993, 7-12
    [9] Li C.F, Wang G. Z, Zhang J. A recursive nonlinear PLS algorithm for adaptive nonlinear process modeling[J]. Chem. Eng. Technol, 2005, 28(2)
    [10] Vinzi V E, Tenenhaus M. PLS Logistic Regression[J]. 2002.
    [11] Tenenhaus A; Giron A; Viennet E. Kernel logistic PLS:A tool for supervised nonlinear dimensionality reduction and binary classification[J]. Computational Statistics & Data Analysis, 2007, 51: 4083-4100
    [12] Geladi P, Kowalski R. Partial least-squares regression[J]. Analytica Chimica Acta, 1986,185:1-17
    [13] Wold H. In Reserch Papers in Statistics(Ed: F. David)[J]. Wiley & Son, 1966
    [14] Dayal B. S, MacGregor J. F. Improved PLS algorithms[J]. Chemometrics, 1997, 11, 73-85
    [15] De Jong S. Simple: Analternative approach to Partial least squares regression[J]. Chemometrics and Intelligent Laboratory Systems, 1993, 18: 251一263
    [16]成忠. PLSR用于化学化工建模的几个关键问题的研究[D].博士学位论文.浙江:浙江大学,2005
    [17] Hoskuldsson A. PLS regression methods[J]. Chemometrics, 1988, 2: 211-228
    [18] Lindgren F, Wold S. Kernel-based PLS regression: Cross-validation and applications to spectral data[J]. Chemometrics, 1994, 8: 377–389
    [19] Zhu L. P. On distribution-weighted partial least squares with diverging number of highly correlated predictors[J]. J. R. Statist. Soc, 2009, 71: 525-548
    [20] Wold S, Kettanech-Wold N, Skagerberg B. Nonlinear PLS modeling[J]. Chemomet. Intell. Lab. Syst, 1989, 7: 53-65
    [21] Wold S. Nonlinear partial least squares modeling II. Spline inner relation[J]. Chemomet. Intell. Lab. Syst, 1992, 14: 71-84
    [22] Baffi G, Martin E. B, Morris A. J. Nonlinear projection to latent structure revisited: the quadratic PLS algorithm[J]. Computer and Chemical Engineering, 1999, 23: 395-411
    [23] Qin S. J, McAvoy. T. J. Nonlinear PLS modeling using neural network[J]. Computer and Chemical Engineering, 1992, 16: 379-391
    [24] Qin S. J. A statistical perspective of neutral networks for process modeling and control[C]. Chicago,Illinois USA: Proceedings of the 1993 International Symposium on Intelligent Control, 1993, 559-604
    [25] Wilson D. J. H, Irwin G. W, Lightbody G. Nonlinear PLS modeling using radial basis function[C]. New Mexico: In America Control Conference, 1997: 4-6
    [26] Baffi G, Martin E. B, Morris A. J. Nonlinear projection to latent structures revisited(the neural network PLS algorithm)[J]. Computer and Chemical Engeering, 1999, 23: 1293-1307
    [27] Trygg J, Wold S. Orthogonal projection to latent structures[J]. Chemomtrics, 2002, 16:119-128
    [28] Gottfries J, Johansson E, Trygg J. On the impact of uncorrelated variation in regression mathematics[J]. Chemometrics, 2008, 22: 565-570
    [29] Wold S, Antti H,lindgren F. Orthogonal signal correction of near-infrared [J] Chemometrics Intell. Lab. Syst, 1998, 44:175-185
    [30] Andersson C. A. Direct orthogonalization[J]. Chemometrics Intell. Lab. Syst, 1999, 47: 51-63
    [31] Fearn T. On orthogonal signal correction[J]. Chemometrics Intell. Lab. Syst, 50: 7-52
    [32] Wold S. Cross-validatory estimation of the number of components in factor and principal components models[J]. Technometrics, 1978, 20: 397-405
    [33] Verron T, Sabatier R, Joffre R. Some theoretical properties of the O-PLS Method[J]. Chemometrics Intell. Lab. Syst, 2004, 18:62-68
    [34] Gabrielsson J, Jonsson H, Airian C. methodology for analysis of pre-processing effecton spectroscopic data[J]. Chemometrics and Intelligent Lab System, 2006, 84:153-158
    [35] Hedenstrom M, Wiklund S, Sundberg B. Visualization and interpretation of OPLS models based on 2D NMR data[J]. Chemometrics and Intelligent Lab System, 2008, 92: 110-117
    [36]朱世武. SAS编程技术教程[M].北京:清华大学出版社, 2007
    [37] Helland K, Berntsen H. E, Borgen O. S. Recursive algorithm for partial least squares regression[J]. Chemometrics and Intelligent Lab System, 1992, 14: 129-137
    [38] Qin S. J. Recrusive PLS algorithms for adaptive data modeling[J]. Computer Chem. Engng, 1998, 22: 503-514
    [39] Dayal B. S, MacGregort J. F. Recursive exponentially weighted PLS and its application to adaptive control and prediction[J]. Proc. Cont, 1997, 7: 169-179
    [40] Vijaysai P, G.udi R. D, Lakshminarayanan S. Identification on Demand Using a Blockwise Recursive Partial Least-Squares Technique[J]. Ind. Eng. Chem, 2003, 42: 540-554
    [41] Facco P, Bezzo F, Barlol M. Nearest-Neighbor method for the automatic maintenance of multivariate statistical soft sensors in batch processing[J]. Ind. Eng. Chem. 2010, 49: 2336-2347
    [42] Chen K. J, He M, Zhang D. Z. Sliding-Window Recursive PLS Based Soft Sensing Model and Its Application to the Quality Control of Rubber Mixing Process[J]. Z. Cai et al, 2009, 51: 16-24
    [43] Li C.F, Y H, Wang J. A recursive nonlinear PLS algorithm for adaptive nonlinear process modeling[J]. Chem. Eng. Technol, 2005,28: 141-152
    [44] Heinze. G. A comparative investigation of methods for logistic regression with separated or nearly separated data[J]. Statistics in Medicine, 2006, 25: 4216-4226
    [45] Escabias M; Aguilera A. M; Valderrama M. J. Functional PLS logit regression model[J]. Computational Statistis & Data Analysis, 2007, 51: 4891-4902
    [46] Nicolai B. M; Theron K. I; Lammertyn J. Kernel PLS regression on wavelet transformed NIR spectra for prediction of sugar content of apple[J]. Chemometrics and Intelligent Laboratory System, 2007, 85: 243-252
    [47] Wang W. J, Xu Z. B, Lu W. Z. Determination of the spread parameter in the Gaussian kernel for classification and regression[J]. Neurocomputing, 2003, 55: 643-663

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700