Parallel coordinate descent methods for big data optimization
详细信息    查看全文
  • 作者:Peter Richtárik ; Martin Takáč
  • 关键词:Parallel coordinate descent ; Big data optimization ; Partial separability ; Huge ; scale optimization ; Iteration complexity ; Expected separable over ; approximation ; Composite objective ; Convex optimization ; LASSO ; 90C06 ; 90C25 ; 49M20 ; 49M27 ; 65K05 ; 68W10 ; 68W20 ; 68W40
  • 刊名:Mathematical Programming
  • 出版年:2016
  • 出版时间:March 2016
  • 年:2016
  • 卷:156
  • 期:1-2
  • 页码:433-484
  • 全文大小:1,152 KB
  • 参考文献:1.Bradley, J.K., Kyrola, A., Bickson, D., Guestrin, C.: Parallel coordinate descent for L1-regularized loss minimization. In ICML (2011)
    2.Dhillon, I., Ravikumar, P., Tewari, A.: Nearest neighbor based greedy coordinate descent. NIPS 24, 2160–2168 (2011)
    3.Fercoq, O., Richtárik, P.: Smooth minimization of nonsmooth functions with parallel coordinate descent methods. arXiv:​1309.​5885 (2013)
    4.Leventhal, D., Lewis, A.S.: Randomized methods for linear constraints: convergence rates and conditioning. Math. Oper. Res. 35(3), 641–654 (2010)CrossRef MathSciNet MATH
    5.Li, Y., Osher, S.: Coordinate descent optimization for \(l_1\) minimization with application to compressed sensing; a greedy algorithm. Inverse Probl. Imaging 3, 487–503 (2009)CrossRef MathSciNet MATH
    6.Necoara, Ion, Clipici, Dragos: Efficient parallel coordinate descent algorithm for convex optimization problems with separable constraints: application to distributed mpc. J. Process Control 23(3), 243–253 (2013)CrossRef
    7.Necoara, I., Nesterov, Y., Glineur, F.: Efficiency of randomized coordinate descent methods on optimization problems with linearly coupled constraints. Technical report, Politehnica University of Bucharest (2012)
    8.Necoara, I., Patrascu, A.: A random coordinate descent algorithm for optimization problems with composite objective function and linear coupled constraints. Comput. Optim. Appl. 57(2), 307–337 (2014)CrossRef MathSciNet MATH
    9.Nesterov, Yurii: Introductory Lectures on Convex Optimization: A Basic Course (Applied Optimization). Kluwer, Dordrecht (2004)CrossRef
    10.Nesterov, Y.: Efficiency of coordinate descent methods on huge-scale optimization problems. SIAM J. Optim. 22(2), 341–362 (2012)CrossRef MathSciNet MATH
    11.Nesterov, Yurii: Gradient methods for minimizing composite objective function. Math. Program. Ser. B 140(1), 125–161 (2013)CrossRef MathSciNet MATH
    12.Nesterov, Y.: Subgradient methods for huge-scale optimization problems. Math. Program. 146(1–2), 275–297 (2014)CrossRef MathSciNet MATH
    13.Niu, F., Recht, B., Ré, C., Wright, S.: Hogwild!: A lock-free approach to parallelizing stochastic gradient descent. In NIPS (2011)
    14.Peng, Z., Yan, M., Yin, W.: Parallel and distributed sparse optimization. In Signals, Systems and Computers, 2013 Asilomar Conference on IEEE, pp. 659–646 (2013)
    15.Richtárik, P., Takáč, M.: Iteration complexity of randomized block-coordinate descent methods for minimizing a composite function. Math Program 144(1–2), 1–38 (2014)CrossRef MathSciNet MATH
    16.Richtárik, P, Takáč, M: Efficient serial and parallel coordinate descent method for huge-scale truss topology design. In Klatte, D., Lüthi, H-J., Schmedders, K. (eds.) Operations Research Proceedings, pp. 27–32. Springer, Berlin (2012)
    17.Richtárik, P., Takáč, M.: Distributed coordinate descent method for learning with big data. arXiv:​1310.​2059 (2013)
    18.Richtárik, P., Takáč, M.: On optimal probabilities in stochastic coordinate descent methods. arXiv:​1310.​3438 (2013)
    19.Richtárik, P., Takáč, M.: Efficiency of randomized coordinate descent methods on minimization problems with a composite objective function. In 4th Workshop on Signal Processing with Adaptive Sparse Structured Representations (June 2011)
    20.Ruszczynski, A.: On convergence of an augmented Lagrangian decomposition method for sparse convex optimization. Math. Oper. Res. 20(3), 634–656 (1995)CrossRef MathSciNet MATH
    21.Saha, A., Tewari, A.: On the nonasymptotic convergence of cyclic coordinate descent methods. SIAM J. Optim. 23(1), 576–601 (2013)CrossRef MathSciNet MATH
    22.Scherrer, C., Tewari, A., Halappanavar, M., Haglin, D.J.: Feature clustering for accelerating parallel coordinate descent. In NIPS, pp. 28–36 (2012)
    23.Shalev-Shwartz, S., Tewari, A.: Stochastic methods for \(\ell _1\) -regularized loss minimization. J. Mach. Learn. Res. 12, 1865–1892 (2011)MathSciNet MATH
    24.Strohmer, T., Vershynin, R.: A randomized Kaczmarz algorithm with exponential convergence. J. Fourier Anal. Appl. 15, 262–278 (2009)CrossRef MathSciNet MATH
    25.Takáč, M., Bijral, A., Richtárik, P., Srebro, N.: Mini-batch primal and dual methods for SVMs. In ICML (2013)
    26.Tappenden, R., Richtárik, P., Gondzio, J.: Inexact coordinate descent: complexity and preconditioning. arXiv:​1304.​5530 , (April 2013)
    27.Jinchao, X.: Iterative methods by space decomposition and subspace correction. SIAM Rev. 34(4), 581–613 (1992)CrossRef MathSciNet MATH
    28.Yu, H.F., Hsieh, C. J., Si, S., Dhillon, I.: Scalable coordinate descent approaches to parallel matrix factorization for recommender systems. In IEEE 12th International Conference on Data Mining, pp. 765–774 (2012)
    29.Zargham, M., Ribeiro, A., Ozdaglar, A., Jadbabaie, A.: Accelerated dual descent for network optimization. In American Control Conference (ACC), 2011, pp. 2663–2668. IEEE (2011)
  • 作者单位:Peter Richtárik (1)
    Martin Takáč (1)

    1. School of Mathematics, University of Edinburgh, Edinburgh, UK
  • 刊物类别:Mathematics and Statistics
  • 刊物主题:Mathematics
    Calculus of Variations and Optimal Control
    Mathematics of Computing
    Numerical Analysis
    Combinatorics
    Mathematical and Computational Physics
    Mathematical Methods in Physics
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1436-4646
文摘
In this work we show that randomized (block) coordinate descent methods can be accelerated by parallelization when applied to the problem of minimizing the sum of a partially separable smooth convex function and a simple separable convex function. The theoretical speedup, as compared to the serial method, and referring to the number of iterations needed to approximately solve the problem with high probability, is a simple expression depending on the number of parallel processors and a natural and easily computable measure of separability of the smooth component of the objective function. In the worst case, when no degree of separability is present, there may be no speedup; in the best case, when the problem is separable, the speedup is equal to the number of processors. Our analysis also works in the mode when the number of blocks being updated at each iteration is random, which allows for modeling situations with busy or unreliable processors. We show that our algorithm is able to solve a LASSO problem involving a matrix with 20 billion nonzeros in 2 h on a large memory node with 24 cores. Keywords Parallel coordinate descent Big data optimization Partial separability Huge-scale optimization Iteration complexity Expected separable over-approximation Composite objective Convex optimization LASSO

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700