Meeting ID: 862 3823 3298
Title: Scalable Supervision for Semantic and Geometric Vision
Abstract: Pairing deep neural networks with large training datasets has led to massive advances on a wide variety of vision tasks in the past decade. To continue scaling to larger and more complex data, we must develop scalable forms of supervision that do not rely on explicit human annotation. In contrast to generic unsupervised learning, my work aims to take advantage of additional forms of supervision natural for the task at hand. For semantic vision tasks, I will argue that paired vision+language data is an effective form of supervision that can be acquired at scale from the web. To this end, I will discuss our VirTex method for learning visual features from text, as well as our large-scale RedCaps dataset of image and text data. For geometric tasks, I will argue that the 3D structure of the world can be used for auxiliary supervision, and in particular that differentiable rendering is a core tool for bridging 2D data and 3D tasks without supervision. I will show how these ideas can be applied to a variety of 3D vision tasks including shape prediction, novel view synthesis, and point cloud registration.
Bio: Justin Johnson is an Assistant Professor of Computer Science and Engineering at the University of Michigan, Ann Arbor and a Visiting Scientist at Facebook AI Research. He completed his PhD at Stanford University, advised by Fei-Fei Li. His research interests lie primarily in computer vision and include visual reasoning, vision and language, 3D perception, and differentiable rendering.