Title: Bringing Robust Robots Into Our Chaotic World
Embodied agents are often trained in simulators to perform tasks like navigation, manipulation and instruction following. Although simulators have become more realistic over the years, they continue to provide a limited number of scenes for training. Due to the complexity of tasks and the need for long planning horizons, the best performing agents continue to overfit on the limited training scenes and generalize poorly to unseen environments. In this talk I will present ProcTHOR, a framework for the procedural generation of Embodied AI environments, enabling us to sample arbitrarily large datasets of diverse, interactive, customizable, and performant virtual environments. Pre-training on ProcTHOR scenes improves generalization and produces SOTA agents for several tasks. I will then present Phone2Proc, a method that requires a 10-minute phone scan of a real world scene and uses conditional procedural generation to create a distribution of training scenes that lie semantically close to your target. Sampling from this distribution produces scenes that respect the original layout and arrangement of large objects but with variations in lighting, clutter, textures and small objects. Fine tuning of these provides large improvements on object navigation in real physical environments. Next, I will present ObjaVerse, a new large scale repository of over 800k 3D models paired with tags and descriptions, that can be used to vastly expand the visual vocabulary of agents trained in simulation and further improve generalization. Finally, I will address the evaluation of agents beyond typical task specific success measures. I will present InfLevel, a benchmark for studying the core physical reasoning capabilities for models such as Continuity, Solidity and Gravity, inspired by work studying infants in developmental psychology.
Bio: Ani Kembhavi is the Director of Computer Vision at the Allen Institute for Artificial Intelligence (AI2) in Seattle. He is also an Affiliate Associate Professor at the Computer Science & Engineering department at the University of Washington. He obtained his PhD at the University of Maryland, College Park under the supervision of Prof. Larry S. Davis and also spent 5 years at Microsoft building their Image and Video search engines. His research interests lie at the intersection of computer vision, natural language processing and embodiment. His work has been awarded an Outstanding Paper Award at Neurips 2022, an AI2 Test of Time award in 2020 and an NVIDIA Pioneer Award in 2018.