We look forward to seeing you online on Thursday, January 25.
Abstract: Generative AI has led to stunning successes in recent years. These models can synthesize realistic and complex visual content conditioned on representations of internal states such as text descriptions, evidently capturing general visual knowledge. Parallel to focusing on enhancing the quality of content generation, my research concentrates on its inverse: extracting the general visual knowledge from these generative models for generalizable visual understanding. In this talk, I’ll first introduce how a diffusion model as a self-supervised objective enables visual representations adept at both recognition and generation. I’ll then illustrate how unpacking a well-trained text-to-image model leads to image captions with accuracy and comprehensiveness unseen in their uncurated training data. I’ll discuss how the compositional nature of generative models leads to effective extrapolation beyond the training data.
Bio: Chen Wei is a fifth-year Ph.D student at Johns Hopkins University, advised by Professor Alan L. Yuille. Her research focuses on computer vision, with a particular interest in developing generalizable and scalable visual representations. Chen received B.Sc. from Peking University. She has interned at FAIR at Meta and Google DeepMind.