Leave-one-out cross-validation is risk consistent for lasso

详细信息查看全文

作者：Darren Homrighausen ; Daniel J. McDonald
关键词：Stochastic equicontinuity ; Uniform convergence ; Persistence
刊名：Machine Learning
出版年：2014
出版时间：October 2014
年：2014
卷：97
期：1-2
页码：65-78
全文大小：206 KB
参考文献：1. Bickel, P. J., Ritov, Y., & Tsybakov, A. B. (2009). Simultaneous analysis of lasso and dantzig selector. / The Annals of Statistics, / 37(4), 1705-732. CrossRef
2. Bousquet, O., & Elisseeff, A. (2002). Stability and generalization. / The Journal of Machine Learning Research, / 2, 499-26.
3. Bunea, F., Tsybakov, A., & Wegkamp, M. (2007). Sparsity oracle inequalities for the lasso. / Electronic Journal of Statistics, / 1, 169-94. CrossRef
4. Chatterjee, A., & Lahiri, S. (2011). Strong consistency of lasso estimators. / Sankhya A-Mathematical Statistics and Probability, / 73(1), 55-8.
5. Chen, S. S., Donoho, D. L., & Saunders, M. A. (1998). Atomic decomposition by basis pursuit. / SIAM Journal on Scientific Computing, / 20(1), 33-1. CrossRef
6. Davidson, J. (1994). / Stochastic limit theory: An introduction for econometricians. Oxford: Oxford university press. CrossRef
7. Efron, B., Hastie, T., Johnstone, I., & Tibshirani, R. (2004). Least angle regression. / The Annals of Statistics, / 32(2), 407-99. CrossRef
8. Fu, W., & Knight, K. (2000). Asymptotics for lasso-type estimators. / The Annals of Statistics, / 28(5), 1356-378. CrossRef
9. van de Geer, S., & Lederer, J. (2013). The Lasso, correlated design, and improved oracle inequalities. (2011). http://arxiv.org/abs/1107.0189
10. Grandvalet, Y. (1998). Least absolute shrinkage is equivalent to quadratic penalization. In ICANN 98 (pp. 201-206). London: Springer
11. Greenshtein, E., & Ritov, Y. A. (2004). Persistence in high-dimensional linear predictor selection and the virtue of overparametrization. / Bernoulli, / 10(6), 971-88. CrossRef
12. Gy?rfi, L., Kohler, M., Krzy?ak, A., & Walk, H. (2002). A Distribution-Free Theory of Nonparametric Regression, Verlag: Springer.
13. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The Elements of Statistical Learning: Data Mining, Inference, and Prediction. Verlag: Springer.
14. Lee, S., Zhu, J., & Xing, E. P. (2010). Adaptive multi-task Lasso: With application to eQTL detection. In J. Lafferty, C. K. I. Williams, J. Shawe-Taylor, R. Zemel, & A. Culotta (Eds.) Advances in neural information processing systems, vol. 23 (pp. 1306-314 ).
15. Leng, C., Lin, Y., & Wahba, G. (2006). A note on the lasso and related procedures in model selection. / Statistica Sinica, / 16(4), 1273-284.
16. Meinshausen, N., & Bühlmann, P. (2006). High-dimensional graphs and variable selection with the lasso. / The Annals of Statistics, / 34(3), 1436-462. CrossRef
17. Newey, W. K. (1991). Uniform convergence in probability and stochastic equicontinuity. / Econometrica, / 59(4), 1161-167. CrossRef
18. Osborne, M., Presnell, B., & Turlach, B. (2000). On the lasso and its dual. / Journal of Computational and Graphical statistics, / 9(2), 319-37.
19. Schaffer, C. (1993). Selecting a classification method by cross-validation. / Machine Learning, / 13, 135-43.
20. Shao, J. (1993). Linear model selection by cross-validation. / Journal of the American Statistical Association, / 88, 486-94. CrossRef
21. Shi, W., Wahba, G., Wright, S., Lee, K., Klein, R., & Klein, B. (2008). LASSO-patternsearch algorithm with application to ophthalmology and genomic data. / Statistics and its Interface, / 1(1), 137. CrossRef
22. Stromberg, K. (1994). / Probability for analysts. London: Chapman & Hall.
23. Tibshirani, R. (1996). Regression shrinkage and selection via the lasso. / Journal of the Royal Statistical Society: Series B (Statistical Methodology), / 58(1), 267-88.
24. Tibshirani, R. (2011). Regression shrinkage and selection via the lasso: A retrospective. / Journal of the Royal Statistical Society: Series B (Statistical Methodology), / 73(3), 273-82. CrossRef
25. Tibshirani, R. J. (2013). The lasso problem and uniqueness. / Electronic Journal of Statistics, / 7, 1456-490. CrossRef
26. Tibshirani, R. J., & Taylor, J. (2012). Degrees of freedom in lasso problems. / The Annals of Statistics, / 40, 1198-232. CrossRef
27. Wang, H., & Leng, C. (2007). Unified lasso estimation by least squares approximation. / Journal of the American Statistical Association, / 102(479), 1039-048. CrossRef
28. Xu, H., Mannor, S., & Caramanis, C. (2008). Sparse algorithms are not stable: A no-free-lunch theorem. In: Proceedings of the IEEE 46th Annual Allerton Conference on Communication, Control, and Computing, (pp. 1299-303).
29. Zou, H., Hastie, T., & Tibshirani, R. (2007). On the degrees of freedom of the lasso. / The Annals of Statistics, / 35(5), 2173-192. CrossRef
作者单位：Darren Homrighausen (1)
Daniel J. McDonald (2)

1. Department of Statistics, Colorado State University, Fort Collins, CO, 80523, USA
2. Department of Statistics, Indiana University, Bloomington, IN, 47408, USA
ISSN：1573-0565

文摘

The lasso procedure pervades the statistical and signal processing literature, and as such, is the target of substantial theoretical and applied research. While much of this research focuses on the desirable properties that lasso possesses—predictive risk consistency, sign consistency, correct model selection—these results assume that the tuning parameter is chosen in an oracle fashion. Yet, this is impossible in practice. Instead, data analysts must use the data twice, once to choose the tuning parameter and again to estimate the model. But only heuristics have ever justified such a procedure. To this end, we give the first definitive answer about the risk consistency of lasso when the smoothing parameter is chosen via cross-validation. We show that under some restrictions on the design matrix, the lasso estimator is still risk consistent with an empirically chosen tuning parameter.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700