OCR Event Manager - Master Calendar

View Full Calendar

AICE Center, NLP Large Group Series: Sewoong Lee, "Sparse Autoencoders: Discoveries, Limitations, and the Bridge Between Experiement and Theory."

Event Type
Seminar/Symposium
Sponsor
AICE Center
Location
2405 Siebel Center
Virtual
Join online
Date
Nov 5, 2025   1:00 - 2:00 pm  
Speaker
Sewoong Lee
Contact
Prof. Dilek Hakkani-Tur
E-Mail
dilek@illinois.edu
Originating Calendar
Siebel School Speakers Calendar

Abstract: While the behavior of large language models (LLMs) has been extensively studied, when it comes to the question of how such behaviors emerge, the answer is often nothing more than somehow. Mechanistic interpretability is a field that begins with this very question, aiming to uncover the internal mechanisms of LLMs. In this talk, I will start from a classic problem in cognitive science known as the grandmother neuron, and move toward the modern method of identifying mono-semantic neurons in today's LLMs using the sparse autoencoder (SAE). SAE has led to many interesting discoveries, yet there are two major challenges: (1) the design of SAEs often relies on ad-hoc assumptions and (2) evaluating what makes a good SAE remains largely an open problem. Building on these observations, I will present a theoretical formalization that views SAE as an architecture grounded in two hypotheses, the linear representation hypothesis (LRH) and the superposition hypothesis (SH), and show how this perspective can guide us to the design of better SAEs and provide a new way to evaluate them that has been overlooked in prior work.

Short Bio: Sewoong Lee is a PhD student at UIUC, advised by Professor Julia Hockenmaier. He received his Bachelor's degree from Seoul National University and his Master of Computer Science from UIUC. Before joining the PhD program, he worked as a Staff Engineer in Data Science at Samsung Electronics. His research has been featured at COLM. Sewoong has been awarded the Pre-doctoral Fellowship from UIUC and the National Science and Engineering Scholarship from the Korea Student Aid Foundation. His research interests lie at the intersection of AI, NLP, and mechanistic interpretability.

link for robots only