We are excited to host Professor Zhuang Liu from Princeton University for this week’s Vision External Speaker Series. His talk examines why dataset bias endures and presents his group’s ongoing efforts to combat it. Join us on Zoom and be ready to participate.
Here are the talk details:
When: Thursday, Oct. 16, 4–5PM Central Time
Location: https://illinois.zoom.us/j/81812620565?pwd=VZRKnUabd8PByNxKJz1oHgfj4vWgev.1
Title: A Decade's Battle on Dataset Bias: Are We There Yet?
Abstract: Data is the prime ingredient of modern AI. To move toward artificial general intelligence, models must learn from datasets that are as broad and unbiased as possible. Yet today’s large-scale vision datasets remain surprisingly skewed. By revisiting the decade-old “Name That Dataset” experiment, I show that a simple neural network classifier can guess an image’s source with over 80% accuracy — underscoring persistent bias and dataset fingerprints. Controlled perturbation studies further reveal which visual attributes (color, semantics, geometry, etc.) carry the strongest signals. The problem isn’t limited to vision: a lightweight classifier can identify the originating large language model from its text with 97% accuracy, exposing similarly sharp idiosyncrasies in model outputs, and by extension, their training data. These results demonstrate that scale alone cannot solve dataset bias, and I’ll conclude by discussing our ongoing efforts and potential paths forward.
Speaker Bio.: Zhuang Liu is an Assistant Professor of Computer Science at Princeton University. His research areas are deep learning and computer vision, with an emphasis on empirical approaches to understanding how models work and behave. His work spans vision and language, unified by a focus on deep learning methods, representations, and architectures. Prior to joining Princeton, he was a Research Scientist at Meta AI Research (FAIR) in New York City. He received his Ph.D. from UC Berkeley and his B.E. from Tsinghua University, both in Computer Science. He is a recipient of the CVPR Best Paper Award.