Deep learning methods stand out with their ability to handle complicated data and tasks. However, successfully applying cutting-edge deep learning methods usually requires lots of extra care (e.g., heuristic tricks, excessive tuning on hyper-parameters, and data annotation costs). Given the inherent resource limitation of real-world applications, the demand of these efforts has hindered various applications and research. Bearing this in mind, I strive to build productive algorithms that can effectively make deep learning effort-light and easy-to-use.
In this talk, I will address the extra care required to train Transformer networks (the backbone of many recent breakthroughs like BERT). First, my analyses reveal that unbalanced gradients are not the root cause of the unstable Transformer training and uncover a long-overlooked issue, i.e., model sensitivity to parameter updates. In light of these analyses, I successfully stabilize Transformer training and achieve the new state of the art without introducing any additional hyper-parameters. Secondly, I identify a problem of the adaptive learning rate, which not only provides guidance on training configurations and further stabilizes model training, but also sheds insights on the mystery of why the learning rate warmup is necessary. Putting these two aspects together forms a comprehensive inspection on extra care required for Transformer and simplifies Transformer training by reducing those extra care. In closing, I present a broader overview of my research and discuss how it can benefit biostatistics and biomedical informatics research.
Liyuan Liu is a Ph.D. candidate in Computer Science at the University of Illinois at Urbana-Champaign, advised by Prof. Jiawei Han. He received his B.Eng. in Computer Science and Engineering at the University of Science and Technology of China in 2016. In his research, he strives to develop productive algorithms that can effectively reduce the resource consumption of deep learning, including expert efforts for data annotation and computation resources for tuning and training. Liyuan has published more than 20 papers in Top-Tier Conferences during his Ph.D. study. Liyuan has been awarded several fellowships and scholarships, including 2020 Yee Fellowship and 2015 Guo Moruo Scholarship. More information is available on his web page: http://liyuanlucasliu.github.io/
*This is the same talk that will be presented on 3/16.