Understanding the role of training algorithms in over-parameterized learning: insights from case studies | DAIL

Yuan Chao, Assistant Professor in the Department of Statistics and Actuarial Science and Department of Mathematics, the University of Hong Kong

Friday May 05, 2023 10:00 am - 11:30 am

Zoom meeting

https://nyu.zoom.us/j/95132681319

Abstract: Modern machine learning models such as large language models often contain a huge number of parameters. For such over-parameterized models, there can be infinitely many minimizers of the training loss function, and different training algorithms may thus converge to different solutions. While these solutions may all give zero training errors, they may have drastically different prediction errors. Therefore, to understand large machine learning models, it is necessary to understand the impact of training algorithms on prediction errors. This talk covers some recent studies along this research direction. In the first part of the talk, we will give a theoretical explanation of the generalization gap between stochastic gradient descent and Adam. Specifically, we show that for certain learning problems, gradient descent can train a two-layer convolutiaonl neural network (CNN) to obtain close-to-zero test errors, while Adam can only achieve constant-level test errors. In the second part of the talk, we will show an “implicit bias” result of batch normalization. We prove that when learning a linear model with batch normalization for binary classification, gradient descent converges to a “uniform margin classifier” on the training data. This result can also be extended to a class of simple linear CNNs, where batch normalization has an implicit bias towards a “patch-wise uniform margin”. Based on these results, we also demonstrate that models with batch normalization can outperform those without batch normalization in certain learning problems.

Short Bio: Yuan Cao is an assistant professor in the Department of Statistics and Actuarial Science and Department of Mathematics at the University of Hong Kong. Before joining HKU, he was a postdoctoral scholar at UCLA. He received his B.S. from Fudan University and Ph.D. from Princeton University. Yuan’s research interests include deep learning theory, non-convex optimization, and high-dimensional statistics.