An important characteristic of neural networks is their ability to learn representations of the input data with effective features for prediction, which is believed to be a key factor to their superior empirical performance. To better understand the source and benefit of feature learning in neural networks, we consider learning problems motivated by practical data, where the labels are determined by a set of class relevant patterns and the inputs are generated from these along with some background patterns. We prove that neural networks trained by gradient descent can succeed on these problems. The success relies on the emergence and improvement of effective features, which are learned among exponentially many candidates efficiently by exploiting the data (in particular, the structure of the input distribution). In contrast, no linear models on data-independent features of polynomial sizes can learn to as good errors. Furthermore, if the specific input structure is removed, then no polynomial algorithm in the Statistical Query model can learn even weakly. These results provide theoretical evidence showing that feature learning in neural networks depends strongly on the input structure and leads to the superior performance. Our preliminary experimental results on synthetic and real data also provide positive support.
Yingyu Liang is an assistant professor at the University of Wisconsin-Madison, with research interests in optimization and generalization of deep learning, robust machine learning, and their applications. He received a B.S. in 2008 and an M.S. in 2010 in Computer Science from Tsinghua University, and a Ph.D. degree in Computer Science from Georgia Institute of Technology in 2014, and was a postdoctoral researcher in 2014-2017 in the Computer Science Department at Princeton University. He received the NSF Career Award in 2021.
[Click on image below ↓ for talk recording]