SEVIS方法的局部线性估计及其在超高维数据下的应用
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Local Estimation of Sure Explained Variability Independence Screening and Its Application for Ultrahigh-dimensional Data
  • 作者:连亦旻 ; 陈钊 ; 舒明良
  • 英文作者:LIAN YIMIN;CHEN ZHAO;SHU MINGLIANG;Department of Statistics and Finance, University of Science and Technology of China;Department of Statistics, Pennsylvania State University, State College;Academy of Mathematics and System Sciences, Chinese Academy of Seiences;
  • 关键词:特征筛选 ; 局部线性估计 ; SEVIS ; 超高维数据
  • 英文关键词:feature screening;;local estimation;;SEVIS;;ultrahigh-dimensional data
  • 中文刊名:YYSU
  • 英文刊名:Acta Mathematicae Applicatae Sinica
  • 机构:中国科学技术大学统计与金融系;Department of Statistics,Pennsylvania State University,State College;中国科学院数学与系统科学研究院;
  • 出版日期:2018-01-15
  • 出版单位:应用数学学报
  • 年:2018
  • 期:v.41
  • 语种:中文;
  • 页:YYSU201801001
  • 页数:13
  • CN:01
  • ISSN:11-2040/O1
  • 分类号:3-15
摘要
在大数据时代的背景下,如何从超高维数据中筛选出真正重要的特征成为许多相关行业的研究者们广泛关注的一个问题.特征筛选的核心思想就在于排除那些明显与因变量不相关的特征以达到这一目的.基于核估计的SEVIS(Sure Explained Variability and Independence Screening)特征筛选方法在处理非对称,非线性数据下要在一定程度上优于之前的特征筛选模型,但其采用核估计的方式对非参数部分进行估计的方法仍存在进一步改进的空间.本文就从这个角度出发,将其核估计的算法修改为局部线性估计,并考虑部分特殊情况下的变量选择过程.结果显示,基于局部线性估计的SEVIS方法在准确性,运行效率上都要优于基于核估计的SEVIS的方法.
        It's quite an concerned question about how to extract the true features among ultrahigh-dimensional data, especially in today's era of big data, this question plays a key role in many related industries. The core idea of feature screening is excluding those features that significantly unrelated to response variably to solve this question. The Sure Explained Variability and Independence Screening method has obvious advantages in handling the asymmetry and nonlinearity situations compare to the methods before. But it's kernel estimation still has space for improvement. From this point, we change the kernel estimation to local estimation which been known as more accurate and effective. Some simulations about the feature screening in special situations also proof our view and show that our new algorithm is more efficient than the kernel-based one.
引文
[1]Tibshirani R.Regression shrinkage and selection via the lasso.J.R.Statist.Soc.B.,1996,58:267-288
    [2]Yuan M,Lin Y.Medel Selection and estimation in regression with grouped variables.Journal of the Royal Statistical Society:Series B,2006,68:49-67
    [3]Zou H.The adaptive lasso and its oracle properties.Journal of the American Statistical Association,2006,101:1418-1429
    [4]Fan J.Comment on"Wavelets in statistics:a review"by A.Antoniadis.J.Ital.Statist.Soc.,1997,2:131-138
    [5]Zou H,Hasties T.Regularization and Variable Selection via the Elastic Net.Journal of the Royal Statistical Society,Series B,2005,67:301-320
    [6]Candes E,Tao T.The Dantzig Selector:statistical estimation when p is much larger than n(with discussion).The Annals of Statistics,2007,35:2313-2404
    [7]Fan J,Samworth R,Wu Y.Ultrahigh dimensional feature selection:Beyond the linear model.Journal of Machine Learning Research,2009,10:1829-1853
    [8]Fan J,Lv J.Sure independence screening for ultrahigh dimensional feature space(with discussion).Journal of the Royal Statistical Society,Series B,2008,70:849-911
    [9]Fan J,Song R.Sure independence screening in generalized linear models with NP-dimensionality.The Annals of Statistics,2010,38:3567-3604
    [10]Fan J,Feng Y,Song R.Nonparametric independence screening in sparse ultra-high dimensional additive models.Journal of the American Statistical Association,2011,106:544-557
    [11]Song R,Yi F,Zou H.On Varying-coefficient independence screening for high-dimensional varyingcoefficient models.Statistica Sinica,2012,24:1735-1752
    [12]Liu J,Li R,Wu R.Feature selection for varying coefficient models with ultrahigh dimensional covariates.Technical Report,Departmernt of Statitics,Pennsylvania State University,2013
    [13]Zhu L P,Li L,Zhu L X.Model-free feature screening for ultrahigh dimensional data.Journal of the American Statistical Association,2011,106:1464-1475
    [14]Li R,Zhong W,Zhu L.Feature Screening via Distance Correlation Learning.Journal of American Statistical Association,2012,107:1129-1139
    [15]He X,Wang L,Hong H G.Quantile-adaptive model-free variable scerrning for high-dimensional heterogeneous data.The Annals of Statistics,2013,41:342-369
    [16]Wu Y,Yin G.Conditional quantile screening in ultrahigh-dimensional heterogeneous data.Biometrika,2014,102:65-76
    [17]Chen M,Lian Y,Chen Z,Zhang Z.Sure Explained Variability and Independence Screening.Journal of Nonparametric Statistics,2017,1-35
    [18]Fan J.Design-adaptive Nonparametric Regression.Jornal of the American Statitical Association,1992,87:998-1004
    [19]Zheng S,Shi N Z,Zhang Z.Generalized Measures of Correlation for Asymmetry,Nonlinearity,and Beyond.Journal of the American Statistical Association,2012,107:1239-1252

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700