Computer Science Speakers Series

Back to Listing

Tuo Zhao "On Fine-Tuning of Pretrained Language Models under Limited or Weak Supervision"

Event Type
Seminar/Symposium
Sponsor
The Department of Computer Science, University of Illinois, BLENDER Lab
Location
https://illinois.zoom.us/j/8167899060?pwd=YkZrQ09zODRzL0txRGF5bnhWdmk0UT09
Virtual
wifi event
Date
Feb 12, 2021   10:00 am  
Contact
Candice Steidinger
E-Mail
steidin2@illinois.edu
Phone
217-300-8564
Views
71
Originating Calendar
Computer Science Speakers Calendar

Abstract: 

Transfer learning has fundamentally changed the landscape of natural language processing (NLP). Many state-of-the-art models are first pre-trained on a large text corpus and then fine-tuned on downstream tasks. When we only have limited and weak supervision for the downstream tasks, however, due to the extremely high complexity of pre-trained models, aggressive fine-tuning often causes the fine-tuned model to overfit the training data of downstream tasks and fail to generalize to unseen data.

To address such a concern, we propose a new approach for fine-tuning of pretrained models to attain better generalization performance. Our proposed approach adopts three important ingredients: (1) Smoothness-inducing regularization, which effectively manages the complexity of the massive model; (2) Bregman proximal point optimization, which is an instance of trust-region methods and can prevent aggressive updating; (3) Self-training, which can gradually improve the model fitting and effectively suppress error propagation. Our experiments show that the proposed approach significantly outperforms existing methods in multiple NLP tasks under limited or weak supervision.

Bio: 

Tuo Zhao (https://www2.isye.gatech.edu/~tzhao80/) is an assistant professor in School of Industrial & Systems Engineering at Georgia Tech. He received his Ph.D. degree in Computer Science at Johns Hopkins University. His research mainly focuses on developing methodologies, algorithms and theories for machine learning, especially deep learning. He is also actively working in neural language models and open-source machine learning software for scientific data analysis. He has received several awards, including the winner of INDI ADHD-200 global competition, ASA best student paper award on statistical computing, INFORMS best paper award on data mining and Google faculty research award.

link for robots only