Final Exam - Nonconvex Min-max Optimization in Deep Learning: Algorithms and Applications

July 3, 2020 - 10:00am
This exam will be held remotely.

PhD Candidate: Mingrui Liu


Nonconvex min-max optimization receives increasing attention in modern machine learning, especially in the context of deep learning.  Examples include stochastic AUC maximization with deep neural networks and Generative Adversarial Nets (GANs), which correspond to a nonconvex-concave and nonconvex-nonconcave min-max problem respectively. The classical theory of min-max optimization mainly focuses on convex-concave setting, which is not applicable for deep learning applications with nonconvex min-max formulation. A natural question is proposed---how to design provably efficient algorithms for nonconvex min-max problems in deep learning? To answer this question, this dissertation focuses on the following four concrete aspects:

First, we consider the problem of stochastic AUC maximization problem with deep neural networks as preditive models. Building on the saddle point reformulation of a surrogated loss of AUC, the problem can be cast into a non-convex concave min-max problem. We explore Polyak-Lojasiewicz (PL) condition that has been proved and observed in deep learning, and develop new stochastic algorithms with even faster convergence rate and more practical step size scheme.

Second, we consider the first-order convergence theory for weakly-convex-weakly-concave min-max problems. We propose an algorithmic framework motivated by the inexact proximal point method, where the weakly monotone variational inequality (VI) corresponding to the original min-max problem is solved through approximately solving a sequence of strongly monotone VIs constructed by adding a strongly monotone mapping to the original gradient mapping. We prove first-order non-asymptotic convergence to a nearly stationary point of the original min-max problem in polynomial time.

Third, we consider a class of nonconvex-nonconcave min-max problems with the focus on GAN training. Although adaptive gradient methods with alternate update empirically work well in training GANs, it requires expensive tuning efforts, lacks theoretical convergence guarantees and might diverge in practice. To bridge the gap, we design an adaptive gradient algorithm for training GANs with provably faster convergence than its non-adaptive counterpart.

Fourth, we aim to consider large-scale GAN training in a decentralized distributed manner. Decentralized parallel algorithms are robust to network bandwidth and latency compared with its centralized counterpart and it has merits for protecting users' privacy, but decentralized algorithms for nonconvex-nonconcave min-max optimization are not considered in the existing literature. We propose and analyze a decentralized algorithm for train GANs, and show its provable convergence to first-order stationary point in polynomial time.

Advisor: Tianbao Yang

Please contact Mingrui for further details, if you wish to join his final exam.