Augmented Leverage Score Sampling with Bounds

详细信息查看全文

关键词：Subset selection ; Low ; rank matrix approximation ; Leverage scores ; Spectral analysis
刊名：Lecture Notes in Computer Science
出版年：2016
出版时间：2016
年：2016
卷：9852
期：1
页码：543-558
全文大小：1,561 KB
参考文献：1.Boutsidis, C., Drineas, P., Magdon-Ismail, M.: Near-optimal column-based matrix reconstruction. SIAM J. Comput. 10598(i), 1–27 (2014). http://epubs.siam.org/doi/abs/10.1137/12086755X MathSciNet MATH
2.Brouwer, A.E., Haemers, W.H.: Spectra of Graphs. Springer, New York (2011). https://dx.doi.org/10.1007/978-1-4614-1939-6
3.Chan, T.F., Hansen, P.C.: Some applications of the rank revealing QR factorization. SIAM J. Sci. Comput. 13(3), 727–741 (1992)MathSciNet CrossRef MATH
4.Drineas, P., Mahoney, M.W.: On the Nystrom method for approximating a gram matrix for improved kernel-based learning (Extended abstract). In: Proceedings of Learning Theory, vol. 3559, pp. 323–337 (2005). Go to ISI://WOS:000230769100022
5.Drineas, P., Mahoney, M., Muthukrishnan, S.: Relative-error CUR matrix decompositions. SIAM J. Matrix Anal. Appl. 30(2), 844–881 (2008)MathSciNet CrossRef MATH
6.Efron, B., Hastie, T., Johnstone, I., Tibshirani, R.: Least angle regression. Ann. Stat. 32(2), 407–499 (2004)MathSciNet CrossRef MATH
7.Gittens, A.: The spectral norm error of the naive Nystrom extension. arXiv preprint, pp. 1–9 (2011). http://arxiv.org/pdf/1110.5305 , http://arxiv.org/abs/1110.5305
8.Gittens, A., Mahoney, M.W.: Revisiting the Nystrom method for improved large-scale machine learning. ICML 28, 1–45 (2013)MATH
9.Golub, G.: Numerical methods for solving linear least squares problems. Numer. Math. 7(1), 206–216 (1965)MathSciNet CrossRef MATH
10.Klimt, B., Yang, Y.: Introducing the Enron corpus. In: CEAS (2004)
11.Leskovec, J., Lang, K.J., Dasgupta, A., Mahoney, M.W.: Community structure in large networks: natural cluster sizes and the absence of large well-defined clusters. Internet Math. 6(1), 29–123 (2009)MathSciNet CrossRef MATH
12.Lichman, M.: UCI machine learning repository (2013). http://archive.ics.uci.edu/ml
13.Mahoney, M.W.: Algorithmic and Statistical Perspectives on Large-Scale Data Analysis, p. 33 (2010). http://arxiv.org/abs/1010.1609
14.Mahoney, M.M.W.: Randomized algorithms for matrices and data. arXiv preprint arXiv:1104.5557 , p. 49 (2011)
15.Papailiopoulos, D., Kyrillidis, A., Boutsidis, C.: Provable deterministic leverage score sampling. arXiv preprint, pp. 997–1006 (2014). http://dl.acm.org/citation.cfm?doid=2623330.2623698
16.Paul, S., Magdon-Ismail, M., Drineas, P.: Column Selection via Adaptive Sampling, pp. 1–9 (2015). http://arxiv.org/abs/1510.04149
作者单位：Daniel J. Perry (17)
Ross T. Whitaker (17)

17. SCI Institute, University of Utah, 72 S. Central Campus Drive, Salt Lake City, UT, 84112, USA
丛书名：Machine Learning and Knowledge Discovery in Databases
ISBN：978-3-319-46227-1
刊物类别：Computer Science
刊物主题：Artificial Intelligence and Robotics
Computer Communication Networks
Software Engineering
Data Encryption
Database Management
Computation by Abstract Devices
Algorithm Analysis and Problem Complexity
出版者：Springer Berlin / Heidelberg
ISSN：1611-3349
卷排序：9852

文摘

We introduce a modification to the well studied leverage score sampling algorithm which takes into account data scale, called the augmented leverage score, and introduce an initial error bound proof for the case of deterministic sampling – which to our knowledge is the first bound for this augmented leverage score. We discuss the implications of the error bounds proof and present an empirical evaluation of the proposed augmented leverage score performance on the column subsample selection problem (CSSP) as compared to the traditional leverage score and other methods in both a deterministic and probabilistic sampling paradigm. We show that the augmentation of the leverage score improves the empirical performance on CSSP significantly for many kinds of data.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700