Distributed Machine Learning

September 11, 2015 - 4:00pm to 5:00pm
110 MLH
Huy Lê Nguyễn
Toyota Technological Institute at Chicago (TTIC)

We study the efficiency and fundamental limits of distributed learning algorithms. Generally the data is distributed on many machines and the goal is to solve the learning tasks using as little communication, or as few rounds of communication, as possible.

We first discuss a generic algorithmic technique for distributed constrained submodular maximization, which encapsulates a wide variety of problems in machine learning including exemplar clustering, document summarization, and sensor placement. We demonstrate its efficiency in large problems with different kinds of constraints with objective values always close to what is achievable in the centralized setting.

We then study the fundamental limits of statistical estimation in the distributed setting and in particular the tradeoff between the statistical error and communication. We consider the distributed sparse Gaussian mean estimation problem and provide a tight (up to logarithmic factors) tradeoff between the estimation error and the number of bits communicated between the machines via a distributed data processing inequality, as a generalization of usual data processing inequalities.

Bio: Huy Lê Nguyễn is currently a Research Assistant Professor at TTIC. He received his B.S. in Computer Science and Mathematics and M.Eng. in Computer Science from MIT in 2008 and 2009, respectively, and his Ph.D. in Computer Science from Princeton University in 2014. Before TTIC, he spent a year as a Google Research Fellow at the Simons Institute at UC Berkeley.