OCR Event Manager - Master Calendar

View Full Calendar

AICE Center, NLP Large Group Series: Sewoong Lee, "Sparse Autoencoders: Discoveries, Limitations, and the Bridge Between Experiement and Theory."

Event Type

Seminar/Symposium

Sponsor

AICE Center

Location

2405 Siebel Center

Virtual

Join online

Date

Nov 5, 2025 1:00 - 2:00 pm

Speaker

Sewoong Lee

Contact

Prof. Dilek Hakkani-Tur

E-Mail

dilek@illinois.edu

Originating Calendar

Siebel School Speakers Calendar

Abstract: While the behavior of large language models (LLMs) has been extensively studied, when it comes to the question of how such behaviors emerge, the answer is often nothing more than somehow. Mechanistic interpretability is a field that begins with this very question, aiming to uncover the internal mechanisms of LLMs. In this talk, I will start from a classic problem in cognitive science known as the grandmother neuron, and move toward the modern method of identifying mono-semantic neurons in today's LLMs using the sparse autoencoder (SAE). SAE has led to many interesting discoveries, yet there are two major challenges: (1) the design of SAEs often relies on ad-hoc assumptions and (2) evaluating what makes a good SAE remains largely an open problem. Building on these observations, I will present a theoretical formalization that views SAE as an architecture grounded in two hypotheses, the linear representation hypothesis (LRH) and the superposition hypothesis (SH), and show how this perspective can guide us to the design of better SAEs and provide a new way to evaluate them that has been overlooked in prior work.

Short Bio: Sewoong Lee is a PhD student at UIUC, advised by Professor Julia Hockenmaier. He received his Bachelor's degree from Seoul National University and his Master of Computer Science from UIUC. Before joining the PhD program, he worked as a Staff Engineer in Data Science at Samsung Electronics. His research has been featured at COLM. Sewoong has been awarded the Pre-doctoral Fellowship from UIUC and the National Science and Engineering Scholarship from the Korea Student Aid Foundation. His research interests lie at the intersection of AI, NLP, and mechanistic interpretability.

link for robots only