Large Scale Optimization for Machine Learning

详细信息

作者：Wang ; Huahua
学历：Doctor
年：2014
关键词：Applied sciences ; Data mining ; Machine learning ; Opti
导师：Banerjee,Arindam
毕业院校：University of Minnesota
Department：Computer Science
专业：Artificial intelligence;Computer science
ISBN：9781321636390
CBH：3686785
Country：USA
语种：English
FileSize：4424749
Pages：269

文摘

Over the last several decades,tremendous tools have been developed in machine learning,ranging from statistical models to scalable algorithms,from learning strategies to various tasks,having a far-reaching influence in broad applications ranging from image and speech recognitions to recommender systems,and from bioinformatics to robotics. In entering the era of big data,large scale machine learning tools become increasingly important in training a big model on big data. Since machine learning problems are fundamentally empirical risk minimization problems,large scale optimization plays a key role in building a large scale machine learning system. However,scaling optimization algorithms like stochastic gradient descent (SGD) in a distributed system raises some issues like synchronization since they were not designed for this purpose. Synchronization is required because consistency should be guaranteed,i.e.,the parameters in different machines should be the same. Synchronization leads to blocking computation and performance degradation of a distributed system. Without blocking,overwriting may happen and consistency can not be guaranteed. Moreover,SGD may not be suitable for constrained optimization problems. To address the issues of scaling optimization algorithms,we develop several novel optimization algorithms suitable for distributed systems from two settings,i.e.,unconstrained optimization and equality-constrained optimization. First,building on SGD in the unconstrained optimization setting,we propose online randomized block coordinate descent which randomly updates some parameters using some samples and thus allows the overwriting in SGD. Second,instead of striving to maintain consistency at each iteration in the unconstrained optimization setting,we turn to the equality-constrained optimization which guarantees eventual consistency,i.e.,the parameters in different machines are not the same at each iteration but will be the same eventually. The equality-constrained optimization also includes the cases that SGD can not be applied. The alternating direction method of multipliers (ADMM) provides a suitable framework for equality-constrained optimziation but raises some issues: (1) it does not provide a systematic way to solve subproblems;(2) it requires to solve all subproblems and synchronization;(3) it is a batch method which can not process data online. For the first issue,we propose Bregman ADMM which provides a unified framework to solve subproblems efficiently. For the second issue,we propose parallel direction method of multipliers (PDMM),which randomly picks some subproblems to solve and does asynchronous aggregation. Finally,we introduce online ADMM so that the algorithm can process partial data at each iteration. To validate the effectiveness and scalability of the proposed algorithms,we particularly apply them to a variety of applications,including sparse structure learning and maximum a posterior (MAP) inference in probabilistic graphical models,and online dictionary learning. We also implement the proposed methods on various architectures,including hundreds to thousands CPU cores in clusters and GPUs. Experimental results show that the proposed methods can scale gracefully with the number of cores and perform better than state-of-the-art methods.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700