矩阵恢复算法及误差分析

英文题名：Algorithms and Error Analysis for Matrix Recovery
作者：陈娜
论文级别：博士
学科专业名称：计算机软件与理论
中文关键词：误差分析 ; 矩阵恢复 ; elastic-net正则化 ; 近邻算子 ; 统计学习理论 ; 奇异值收缩算子
英文关键词：Error analysis ; matrix recovery ; elastic-net regularization ; proximity operator ; statistical learning theory ; singular value shrinkage operator
学位年度：2012
导师：李红
学科代码：081202
学位授予单位：华中科技大学
论文提交日期：2012-05-01

摘要

压缩传感(Compressed Sensing)理论是在已知信号具有稀疏性或可压缩性的条件下,对信号数据进行采集、编解码的新理论.在压缩传感中,要恢复的目标是一个向量,而在很多实际问题中,例如图像修复、Netflix问题等等,待恢复的目标通常都是用矩阵来表示的,使得对数据的理解、建模、处理和分析更为方便.然而这些数据经常面临缺失、损失、受噪声污染等问题.如何在这种情形下得到准确的数据,就是矩阵恢复(Matrix Recovery)所要解决的问题.本文围绕矩阵恢复问题中模型的建立、算法的设计、误差分析这三个核心问题,对矩阵恢复问题的基本理论和主要方法进行了系统的阐述.首先,介绍矩阵恢复的研究背景、意义及研究现状.归纳已有的矩阵恢复模型及模型求解方法,总结其优缺点,在此基础上提出矩阵elastic-net正则化模型.其次,对矩阵elastic-net正则化模型的解及解的性质进行研究.构造算法对模型进行求解,并对算法的收敛性进行分析.另外,在不同的假设条件下分别讨论矩阵恢复模型的推广误差界.最后,给出矩阵恢复算法的一致β-稳定性条件,并通过实验验证算法的有效性及解的稳定性.论文的主要内容及结果如下：(1)低秩矩阵恢复问题中,通常的算法通过最小化矩阵核范数来达到低秩解,然而这些算法当数据的相关性非常强时通常会遇到不稳定的情况.论文通过在目标函数中加入矩阵的Frobenius范数,考虑矩阵elastic-net正则化模型,有效地解决了这种不稳定的问题.并且,利用凸优化中近邻算子的概念及相关性质推导出矩阵elastic-net正则化模型的解所满足的不动点方程,构造不动点迭代算法寻找模型的解,证明了迭代算法的解收敛到矩阵elastic-net正则化模型的解.(2)在矩阵RIP (Restricted Isometry Property)假设下分析矩阵elastic-net正则化模型的误差界.给出矩阵RIP假设的定义,及一些满足矩阵RIP的算子A的例子,证明了矩阵elastic-net正则化模型的误差界正比于噪声水平和矩阵自由度的乘积,并将结果推广到满秩矩阵的情形.(3)从统计学习理论的角度出发,在算子假设条件下,全面分析矩阵恢复算法的收敛性及推广性.将矩阵恢复问题描述成一个学习问题,定义一组Hilbert-Schmidt算子,利用算子逼近技术推导出矩阵elastic-net正则化模型的推广误差界,并给出一种自适应的正则化参数选取方法.(4)对算法的稳定性进行研究,考虑矩阵恢复算法的一致β-稳定性条件,给出了保证矩阵恢复算法一致β-稳定时,惩罚函数必需满足的条件.并且证明了本文中考虑的矩阵elastic-net正则化算法是一致β-稳定的.
Compressed sensing is an emerging novel theory for data acquisition, encoding and de-coding under the specific assumption that the data is sparse or compressible. In compressed sensing, the object we wish to acquire is a vector, while in many application of practical inter-est, we often wish to reconstruct an object in the form of a matrix, which is more convenient for data acquisition, modeling, processing and analyzing. However, these data often suffer from the problem of deficiency, loss, or corrupted with noises. The goal of matrix recovery is to acquire the exact data under these situations.In this thesis, we systematically review the basic theory and main methods of matrix recovery around the three core problem of modeling, algorithm designing, and error analysis. First, we present a quick overview of backgrounds, significance and research status of matrix recovery. The mathematic models and assumptions for matrix recovery are provided. We in-troduce the methods for solving these models, summarize the advantages and disadvantages of the methods, and then propose an elastic-net regularization model for matrix recovery. Second, the solution of the matrix elastic-net regularization model and the properties of the solution are discussed. We construct an iterative algorithm for solving the model, and pro-vide a convergence analysis of the algorithm. Moreover, we analyze the generalization error bounds of matrix elastic-net regularization under different assumptions. Finally, we char-acterize the conditions on the penalty function that lead to uniformβ-stability and present numerical experiments to illustrate the effectiveness and the stability of the algorithm.The main results of the thesis could be summarized as follows:(1) In the problem of low-rank matrix recovery, traditional algorithms obtain low-rank solutions by minimizing the nuclear norm of the matrix. However, the algorithms are unsta-ble when the data matrix is highly correlated. Therefore, we consider the matrix elastic-net regularization, which can lead to stable solutions by minimizing the empirical risk penalized with the combination of the nuclear norm and the Frobenius norm. Moreover, the solution of the model can be characterized as the fixed point of a contractive map by the proximity oper- ator. Then we construct a fixed point iterative scheme for solving the model. The theoretical results show that the sequence of iterates converges to the solution of the matrix elastic-net regularization.(2) The error bounds of the matrix elastic-net regularization model under the assumption of matrix restricted isometry property (RIP) are analyzed. We give the definition of the RIP for matrix recovery, and show that certain classes of random measurements satisfy the matrix RIP. The analysis indicates that the error is proportional to the number of degrees of freedom times the noise level under the matrix RIP. Error bound is also established for full-rank matrices.(3) In the framework of statistical learning theory, we give an all-round analysis on the convergence and the generalization performance of the matrix recovery algorithms under the operator assumptions. We describe the matrix recovery problem as a learning problem, and define some Hilbert-Schmidt operators. The generalization error bounds for matrix recovery are then obtained by estimates of Hilbert-Schmidt operators. It is worth mentioning that we also give an adaptive scheme to select the regularization parameter.(4) The stability of the matrix recovery algorithms is studied. We consider the uniformβ-stability of matrix recovery algorithms and characterize the conditions on the penalty function that lead to uniformβ-stability. In particular, we apply our results to show that the matrix elastic-net regularization is uniformlyβ-stable.

引文

[1] Candes E J, Tao T. Decoding by linear programming. IEEE Transactions on Informa-tion Theory,2005,51(12):4203-4215
    [2] Donoho D L. Compressed sensing. IEEE Transactions on Information Theory,2006, 52(4):1289-1306
    [3] Candes E J, Romberg J K. Sparsity and incoherence in compressive sampling. Inverse Problems,2007,23(3):969-985
    [4] Candes E J. The restricted isometry property and its implications for compressed sens-ing. Comptes Rendus de I'Academie des Sciences Series Ⅰ-Mathematics,2008,346(9-10):589-592
    [5] Elad M, Bruckstein A M. On sparse representation. in:Proceedings of International Conference on Image Processing, Hong Kong, China, September,2010
    [6] Gribonval R, Nielsen M. Sparse representations in unions of bases. IEEE Transactions on Information Theory,2004,49(12):3320-3325
    [7] Jiao L C, Bo L F, Wang L. Fast sparse approximation for least squares support vector machine. IEEE Transactions on Neural Networks,2007,18(3):685-697
    [8] Li Y, Cichocki A, Amari S et al. Equivalence probability and sparsity of two sparse solutions in sparse representation. IEEE Transactions on Neural Networks,2008, 19(12):2009-2021
    [9] Wright S J, Nowak R, Figueiredo M. Sparse reconstruction by separable approxima-tion. IEEE Transactions on Signal Processing,2009,57(7):2479-2493
    [10] Zhang T. Analysis of multi-stage convex relaxation for sparse regularization. Journal of Machine Learning Research,2010,11:1081-1107
    [11] Sun P, Yao X. Sparse approximation through boosting for learning large scale kernel machines. IEEE Transactions on Signal Processing,2010,21(6):883-894
    [12] Rakotomamonjy A, Flamary R, Gasso G et al. (?)p-(?)q penalty for sparse linear and sparse multiple kernel multi-task learning. IEEE Transactions on Neural Networks, 2011,22(8):1307-1320
    [13] Candes E J, Recht B. Exact matrix completion via convex optimization. Foundations of Computational Mathematics,2009,9(6):717-772
    [14] Fazel M, Candes E J, Recht B et al. Compressed sensing and robust recovery of low rank matrices. in:Proceedings of 42nd Asilomar Conference on Signals, Systems and Computers, California, USA, October,2008.1043-1047
    [15] Recht B, Fazel M, Parrilo P. Guaranteed minimum rank solutions of matrix equations via nuclear norm minimization. SIAM Review,2010,52(3):471-501
    [16] Candes E J, Recht B. Matrix completion with noise, in:Proceedings of the IEEE, June, 2010.925-936
    [17] Keshavan R H, Montanari A, Oh S. Matrix completion from a few entries. IEEE Transactions on Information Theory,2010,56(2):2980-2998
    [18] Gross D. Recovering low-rank matrices from few coefficients in any basis. IEEE Transactions on Information Theory,2011,57(3):1548-1566
    [19] Candes E J, Li X D, Ma Y et al. Robust principal component analysis? Journal of the ACM,2011,58(3):Article 11 (37pp)
    [20] Bach F R. Consistency of trace norm minimization. Journal of Machine Learning Research,2008,9:1019-1048
    [21] Chen S, Donoho D L, Saunders M. Atomic decomposition by basis pursuit. SIAM Journal on Scientific Computing,1999,20(1):33-61
    [22] Huang K, Tao D, Yuan Y et al. Robust uncertainty principles:exact signal reconstruc-tion from highly incomplete frequency information. IEEE Transactions on Information Theory,2006,52(2):489-509
    [23] Candes E J, Romberg J K, Tao T. Stable signal recovery from incomplete and in-accurate measurements. Communications on Pure and Applied Mathematics,2006, 59(8):1207-1223
    [24] Candes E J, Davenport M A. How well can we estimate a sparse vector? Available: arXiv:1104.5246v3,2011
    [25] Candes E J, Plan Y. A probabilistic and RIPless theory of compressed sensing. IEEE Transactions on Information Theory,2011,57(11):7235-7254
    [26] Candes E J, Tao T. The Dantzig selector:statistical estimation when p is much large than n. Annals of Statistics,2007,35(6):2313-2351
    [27] Zhang T. Some sharp performance bounds for least squares regression with (?)1 regular-ization. Annals of Statistics,2009,37(5A):2109-2144
    [28] Cai T T, Wang L, Xu G. New bounds for restricted isometry constants. IEEE Transac-tions on Information Theory,2010,56(9):4388-4394
    [29] Linter S, Malgouyres F. Solving a variational image restoration model which involves (?)∞ constraints. Inverse Problems,2004,20(3):815-831
    [30] Osher S, Burger M, Goldfarb D et al. An iterative regularization method for to-tal variation-based image restoration. Multiscale Modeling and Simulation,2005, 4(2):460-489
    [31] ACM SIGKDD and Netflix. in:Proceedings of KDD Cup and Workshop, California, USA, August,2007
    [32] Candes E J, Tao T. The power of convex relaxation:near-optimal matix completion. IEEE Transactions on Information Theory,2010,56(5):2053-2080
    [33] Recht B. A simpler approach to matrix completion. Available:arXiv:0910.0651v2, 2009
    [34] Chen Y D, Xu H, Caramanis C et al. Robust matrix completion with corrupted columns. Available:arXiv:1102.2254v 1,2011
    [35] Negahban S, Wainwright M J. Restricted strong convexity and weighted matrix com-pletion:Optimal bounds with noise. Available:arXiv:1009.2118v2,2010
    [36] Dai W, Kerman E, Milenkovic O. A geometric approach to low-rank matrix comple-tion. IEEE Transactions on Information Theory,2012,58(1):237-247
    [37] Chamlawi R, Khan A. Digital image authentication and recovery:Employing integer transform based information embedding and extraction. Information Sciences,2010, 180(24):4909-4928
    [38] Micchelli C A, Shen L X, Xu Y S. Proximity algorithms for image models:denoising. Inverse Problems,2011,27(4):045009(30pp)
    [39] Combettes P L, Wajs V R. Theoretical analysis of some regularized image denoising methods. in:Proceedings of International Conference on Image Processing, Singapore, October,2004.969-972
    [40] Micchelli C A, Shen L X, Xu Y S et al. Proximity algorithms for image models II: L1/TV denoising. Advances in Computational Mathematics, DOI:10.1007/s 10444-011-9243-y
    [41] Basri R, Jacobs D W. Lambertian reflectance and linear subspaces. IEEE Transactions on Pattern Analysis and Machine Intelligence,2003,25(2):218-233
    [42] Candes E J, Plan Y. Tight oracle bounds for low-rank matrix recovery from minimal number of random measurements. Available:arXiv:1001.0339v1,2010
    [43] I.Daubechies, Friese M D, Mol C D. An iterative thresholding algorithm for linear inverse problems with a sparsity constraint. Communications on Pure and Applied Mathematics,2004,57(11):1413-1457
    [44] Tibshirani R. Regression shrinkage and selection via tha lasso. Journal of the Royal Statistical Society Series B-Statistical Methodology,1996,58(1):267-288
    [45] Zhao P, Yu B. On model selection consistency of Lasso. Journal of Machine Learning Research,2006,7:2541-2563
    [46] Zou H, Hastie T. Regularization and variable selection via the elastic net. Journal of the Royal Statistical Society Series B-Statistical Methodology,2005,67(2):301-320
    [47] Mol C D, Vito E D, Rosasco L. Elastic-net regularization in learning theory. Journal of Complexity,2009,25(2):201-230
    [48] Jin B T, Lorenz D A, Schiffler S. Elastic-net regularization:error estimates and active set methods. Inverse Problems,2009,25(11):115022(26pp)
    [49] Jia J Z, Yu B. On model selection consistency of the elastic net when n>> p. Statistica Sinica,2010,20(2):595-611
    [50] Fazel M. Matrix rank minimization with applications:[PhD Dissertation]. Stanford University,2002
    [51] Srebro N, Shraibman A. Rank, trace norm and max-norm. in:Proceedings of the 18th Annual Conference on Learning Theory, Bertinoro, Italy, June,2005.545-560
    [52] Mesbahi M, Papavassilopoulos G P. On the rank minimization problem over a positive semidefinite linear matrix inequality. IEEE Transactions on Automatic Control,1997, 42(2):239-243
    [53] Meka R, Jain P, Caramanis C et al. Rank minimization via online learning, in:Pro-ceedings of the 25th international conference on Machine learning, Helsinki, Finland, July,2008.656-663
    [54] Meka R, Jain P, Dhillon I S. Guaranteed minimization via singular value projection. Available:arXiv:0909.5457v3,2009
    [55] Ma S, Goldfarb D, Chen L. Fixed point and Bregman iterative methods for matrix rank minimization. Mathematical Programming Series A,2011,128(1-2):321-353
    [56] Koltchinskii V. The Dantzig selector and sparsity oracle inequalities. Bernoulli,2009, 15(3):799-828
    [57] Bickel P J, Ritov Y, Tsybakov A B. Simultaneous analysis of Lasso and Dantzig selec-tor. Annals of Statistics,2009,37(4):1705-1732
    [58] James G M, Radchenko P, Lv J C. DASSO:connections between the Dantzig selector and lasso. Journal of the Royal Statistical Society Series B-Statistical Methodology, 2009,71(1):127-142
    [59] Zou H, Zhang H H. On the adaptive elastic-net with diverging number of parameters. Annals of Statistics,2009,37(4):1733-1751
    [60] Recht B, Xu W, Hassibi B. Necessary and sufficient conditions for success of the nuclear norm heuristic for rank minimization. in:Proceedings of the 47th IEEE Con-ference on Decision and Control, Cancun, Mexico, December,2008.3065-3070
    [61] Dvijotham K, Fazel M. A nullspace analysis of the nuclear norm heuristic for rank minimization. in:Proceedings of 2010 IEEE International Conference on Acoustics Speech and Signal Processing, Dallas, Texas, USA, March,2010.3586-3589
    [62] Oymak S, Hassibi B. New null space results and recovery thresholds for matrix rank minimization. Available:arXiv:1011.6326vl,2010
    [63] Recht B, Xu W. Null space conditions and thresholds for rank minimization. Mathe-matical Programming Series B,2011,127(1):175-202
    [64] Eldar Y C, Needell D, Plan Y. Unicity conditions for low-rank matrix recovery. Avail-able:arXiv:1103.5479vl,2011
    [65] Kong L C, Tuncel L, Xiu N H. Sufficient conditions for low-rank matrix recovery, translated from sparse signal recovery. Available:arXiv:1106.3276vl,2011
    [66] Foygel R, Srebro N. Concentration-based guarantees for low-rank matrix reconstruc-tion. in:Proceedings of 24th Annual Conference on Learning Theory, Budapest, Hun-gary, July,2011.315-340
    [67] Beck A, Teboulle M. A fast iterative shrinkage-thresholding algorithm for linear in-verse problems. SIAM Journal on Imaging Sciences,2009,2(1):183-202
    [68] Hale E T, Yin W, Zhang Y. Fixed-point continuation for (?)1-minimization:Methodology and convergence. SIAM Journal on Optimization,2008,19(3):1107-1130
    [69] Yin W T, Osher S, Goldfarb D et al. Bregman iterative algorithms for (?)1-minimization with applications to compressed sensing. SIAM Journal on Imaging Sciences,2008, 1(1):143-168
    [70] Cai J F, Osher S, Shen Z W. Linearized Bregman iterations for compressed sensing. Mathematics of Computation,2009,78(267):1515-1536
    [71] Yang J F, Zhang Y. Alternating direction algorithms for (?)1-problems in compressive sensing. SIAM Journal on Scientific Computing,2011,33(1):250-278
    [72] Needell D, Vershynin R. Signal recovery from incomplete and inaccurate measure-ments via regularized orthogonal matching pursuit. IEEE Journal of Selected Topics in Signal Processing,2010,37(2):310-316
    [73] Yun S, Toh K C. A coordinate gradient descent method for (?)1-regularized convex minimization. Computational Optimization and Applications,2011,48(2):273-307
    [74] Combettes P L, Wajs V R. Signal recovery by proximal forward-backward splitting. Multiscale Modeling and Simulation,2003,4(4):1168-1200
    [75] Cai J F, Candes E J, Shen Z W. A singular value thresholding algorithm for matrix completion. SIAM Journal on Optimization,2010,20(4):1956-1982
    [76] Cai J F, Osher S. Fast singular value thresholding without singular value decomposi-tion. UCLA CAM Report,2010
    [77] Toh K C, Yun S. An accelerated proximal gradient algorithm for nuclear norm regular-ized linear least squares problems. Pacific Journal of Optimization,2010,6(3):615-640
    [78] Yuan X M, Yang J. Sparse and low-rank matrix decomposition via alternating direction methods. Pacific Journal of Optimization,2009. To appear
    [79] Xu Y Y, Yin W T, Wen Z W et al. An alternating direction algorithm for matrix com-pletion with nonnegative factors. Available:arXiv:1103.1168v2,2011
    [80] Gaiffas S, Lecue G. Weighted algorithms for compressed sensing and matrix comple-tion. Available:arXiv:1107.1638v1,2011
    [81] Keshavan R H, Oh S. Optspace:A gradient descent algorithm on the grassman mani-fold for matrix completion. Available:arXiv:0910.5260v2,2009
    [82] Gaiffas S, Lecue G. Sharp oracle inequalities for high-dimentional matrix prediction. IEEE Transactions on Information Theory,2011,57(10):6942-6957
    [83] Vishwanath S. Information theoretic bounds for low-rank matrix completion. in: Proceedings of 2010 IEEE International Symposium on Information Theory, Austin, Texas, USA, June,2010.1508-1512
    [84] Srebro N, Rennie J D M, Jaakkola T S. Maximum-margin matrix factorization. in: Proceedings of Advances in neural information processing systems, December,2005. 1329-1336
    [85] Moreau J J. Fonctions convexes duales et points proximaux dans un espace hilbertien. Comptes Rendus de I'Academie des Sciences Serie A-Mathematique,1962,255:2897-2899
    [86] Moreau J J. Proximite et dualite dans un espace hilbertien. Bulletin de la Societe Mathematique de France,1965,93:273-299
    [87] Vapnik V N. Statistical learning theory. New York:Wiley-Interscience,1998
    [88] Li H, Chen N, Li L Q. Error analysis for matrix elastic-net regularization algorithms. IEEE Transactions on Neural Networks and Learning Systems,2012,23(5):737-748
    [89] Li H, Chen N, Li L Q. Elastic-net regularization for low-rank matrix recovery. In-ternational Journal of Wavelets, Multi-resolution and Information Processing,2012, accepted
    [90] Smale S, Zhou D X. Shannon sampling Ⅱ. Connections to learning theory. Applied and Computational Harmonic Analysis,2005,19(3):285-302
    [91] Smale S, Zhou D X. Geometry on probability spaces. Constructive Approximation, 2009,30(3):311-323
    [92] Li H, Chen N, Tang Y Y. Local learning estimates by integral operators. International Journal of Wavelets, Multi-resolution and Information Processing,2010,8(5):695-712
    [93] Pinelis I. Optimum bounds for the distributions of martingales in Banach space. Annals of Probability,1994,22(4):1679-1706
    [94] Bousquet O, Elisseeff A. Stability and generalization. Journal of Machine Learning Research,2002,2:499-526
    [95] Xu H, Mannor S, Caramanis C. Sparse algorithms are not stable:a no-free-lunch theorem. in:Proceedings of the Allerton Conference on Communication, Control, and Computing, Illinois, USA, September,2008.1299-1303
    [96] Wibisono A, Rosasco L, Poggio T. Sufficient conditions for uniform stability of reg-ularization algorithms. Technical report, MIT Computer Science and Artificial Intelli-gence Laboratory,2009
    [97] Poggio T, Voinea S, Rosasco L. Online learning, stability, and stochastic gradient descent. Available:arXiv:1105.4701v3,2011
    [98] Ojtaszczyk P. Stability and instance optimality for Gaussian measurements in com-pressed sensing. Foundations of Computational Mathematics,2009,10(1):1-13
    [99] Mairal J, Bach F, Ponce J et al. Online learning for matrix factorization and sparse coding. Journal of Machine Learning Research,2010,11:19-60
    [100] Smale S, Yao Y. Online learning algorithms. Foundations of Computational Mathe-matics,2006,6(2):145-170
    [101] Langford J, Li L H, Zhang T. Sparse online learning via truncated gradient. Journal of Machine Learning Research,2009,10:777-801
    [102] Khajehnejad M A, Dimakis A G, Xu W Y et al. Sparse recovery of nonnegative signals with minimal expansion. IEEE Transactions on Signal Processing,2011,59(1):196-208
    [103] Drew J H, Johnson C R. The completely positive and doubly nonnegative completion problems. Linear and Multilinear Algebra,1998,44(1):85-92
    [104] Kolar M, Balakrishnan S, Rinaldo A. Minimax localization of structural information in large noisy matrices. in:Proceedings of Advances in Neural Information Processing Systems, Granada, Spain, December,2011.909-917
    [105] Liu J, Musialski P, Wonka P. Tensor completion for estimating missing values in visual data, in:Proceedings of 2009 IEEE 12th International Conference on Computer Vision, Kyoto, Japan, September,2009.2114-2121
    [106] Gandy S, Recht B, Yamada I. Tensor completion and low-n-rank tensor recovery via convex optimization. Inverse Problems,2011,27(2):025010(19pp)

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700