Title: 3D Vision with 3D View-Predictive Neural Scene representations
Abstract: Current state-of-the-art perception models localize rare object categories in internet photos, yet, they miss basic facts that a two-year-old has mastered: that objects have 3D extent, they persist over time despite changes in the camera view, they do not 3D intersect, and others. We will discuss models that learn to map 2D and 2.5D images and videos into amodally completed 3D feature maps of the scene and the objects in it by predicting views. We will show the proposed models learn object permanence, have objects emerge in 3D without human annotations, support grounding of language in 3D visual simulations, and learn intuitive physics that generalize across scene arrangements and camera configurations. In this way, the proposed world-centric scene representations overcome many limitations of image-centric representations for video understanding, dynamics learning, control and language grounding.
Bio: Katerina Fragkiadaki is an Assistant Professor in the Machine Learning Department in Carnegie Mellon University. She received her Ph.D. from University of Pennsylvania and was a postdoctoral fellow in UC Berkeley and Google research after that. Her work is on learning visual representations with little supervision and on combining spatial reasoning in deep visual learning. Her group develops algorithms for mobile computer vision, learning of physics and common sense for agents that move around and interact with the world. Her work has been awarded with a best Ph.D. thesis award, an NSF CAREER award, AFOSR Young Investigator award, a DARPA Young Investigator award, Google, TRI, Amazon and Sony faculty research awards.
Zoom Meeting: https://illinois.zoom.us/j/86238233298?pwd=Y29EWXRPOWtiZ09DczRYMXJZK3JRUT09
Meeting ID: 862 3823 3298