摘要
如何在高维数据空间中筛选有用变量,提取有用的信息,是大数据时代研究的热点之一。文章将变量选择的方法应用于高维数据,通过模拟仿真,引进敏感性与特异性,分析比较岭回归、Lasso、自适应Lasso以及Elastic Net回归等方法的适用领域,并指出变量选择方法的应用前景。
How to filter useful variables and extract useful information in high-dimensional data space is one of the hot topics in the era of big data.This paper applies the method of variable selection to high-dimensional data,analyzes and compares the applicable fields of ridge regression,Lasso,adaptive Lasso and Elastic Net regression by comparing sensitivity and specificity based on simulation,and points out the application prospect of variable selection method.
引文
[1]刘立祥.线性回归模型中自变量的选择与逐步回归方法[J].统计与决策,2015,(21).
[2]Groll A,Tutz G.Variable Selection in Discrete Survival Models Including Heterogeneity[J].Lifetime Data Anal,2016,(305).
[3]Zou H,Hastie T.Regularization and Variable Selection via the Elastic Net[J].Journal of the Royal Statistical Society.Series B(Statistical Methodology),2014,(301).
[4]Ball K D,ErmanB,Dill K A.The Elastic Net Algorithm and Protein Structure Prediction[J].Journal of Computational Chemistry,2002(23)./