Grainger College of Engineering, All Events

COLLOQUIUM: Xifeng Yan, "Adaptive Inference in Large Language Models"

Apr 1, 2026   3:30 pm  
HYBRID: 2405 Siebel Center for Computer Science or online
Sponsor
Siebel School of Computing and Data Science
Views
2
Originating Calendar
Siebel School Colloquium Series

Zoom: https://illinois.zoom.us/j/85312617915?pwd=IvV6SehiTxie33cfT7ZsbeZ1cao0RM.1

Refreshments Provided.

Abstract: 
Transformer-based large language models (LLMs) have achieved remarkable success, yet many challenges remain. In this talk, I will address a fundamental question: Do all tokens require the same amount of computation within a Transformer? I will share insights into this question and introduce our dynamic layer-skipping algorithm for adaptive inference in pre-trained LLMs, where different tokens are generated using varying numbers of Transformer layers. Our findings show that many layers can be automatically skipped without degrading output quality. These skipped layers reveal a substantial amount of underutilized compute within Transformers, which can be further exploited to enable the generation of multiple tokens using only a subset of layers. We refer to this inference paradigm as Direct Multi-Token Decoding (DMTD). Unlike speculative decoding, our method introduces no additional parameters, no auxiliary routines, and requires no post-generation verification. Despite being trained on a limited dataset, it has demonstrated promising results on a fine-tuned Qwen3-4B model, achieving up to a 2x speedup with only minor performance degradation. Scaling analysis suggests further gains with larger training datasets. At the end of the talk, I will also briefly introduce our efforts on leveraging LLMs for knowledge discovery, such as multimodal models applied to the financial domain. 

Bio:
Xifeng Yan is a professor at the University of California, Santa Barbara, where he holds the Venkatesh Narayanamurti Chair in Computer Science. He received his Ph.D. in Computer Science from the University of Illinois at Urbana-Champaign in 2006 and was a research staff member at the IBM T. J. Watson Research Center from 2006 to 2008. His current research focuses on exploring foundation models in artificial intelligence, leveraging these models for knowledge discovery, and developing cross-disciplinary applications. His work has been widely cited and recognized with numerous honors. His team developed the first Transformer-based time series forecasting model, initiating a new research direction in the field.

 
Part of the Siebel School Speakers Series
Faculty Host: Jiawei Han


Meeting ID: 853 1261 7915
Passcode: csillinois


If accommodation is required, please email <erink@illinois.edu> or <communications@cs.illinois.edu>. Someone from our staff will contact you to discuss your specific needs

 

link for robots only