National Center for Supercomputing Applications master calendar

View Full Calendar

NCSA staff who would like to submit an item for the calendar can email newsdesk@ncsa.illinois.edu.

Machine Learning Seminar: Dylan Zhang, "Supervision Shall Fit the Model During Supervised Fine-tuning."

Event Type
Seminar/Symposium
Sponsor
CS 591 MLR Organizers
Location
1304 Siebel Center
Virtual
Join online
Date
Nov 7, 2025   2:00 - 3:15 pm  
Speaker
Dylan Zhang
Contact
Allison Mette
E-Mail
agk@illinois.edu
Originating Calendar
Siebel School Speakers Calendar

Abstract: Instruction tuning has often been viewed through the lens of distilling “better” behavior from stronger teacher models. However, this framing can misalign with the actual inductive biases and capabilities of the target base model. In this talk, we revisit what truly matters for efficient and effective offline training. We introduce GRAPE, a model-aware data selection strategy designed to align supervision with the model itself. For each instruction, GRAPE selects the candidate response that the target model assigns the highest likelihood, and fine-tunes on that response—producing supervision that is distributionally well-matched to the model rather than imitative of an external teacher. Beyond GRAPE, we extend our discussion by situating this approach within a broader rethinking of supervision for SFT. We highlight two complementary directions in recent literature: (1) alternative post-training objectives that deviate from pure next-token likelihood to better fit models at different capability levels, and (2) token-level reweighting and controlled update schemes such as TALR, which preserve general competence while enabling targeted specialization. Together, these perspectives point toward a unifying view: effective instruction tuning is not about enforcing an abstract notion of “best answers,” but about curating both data and loss functions to fit the model we actually want to deploy.

Bio: Dylan Zhang is a Ph.D. student in Computer Science at the University of Illinois Urbana-Champaign (UIUC), advised by Prof. Hao Peng. His research focuses on large language model (LLM) post-training, particularly on developing offline training algorithms for efficient and effective model alignment. More broadly, he is interested in understanding the behavior, generalization, and inductive biases of large language models—how they learn from data, adapt through supervision, and exhibit emergent capabilities.

link for robots only