Humans are living in a dynamic world. By nature, we are continuously predicting how the surrounding environment changes and how other people act over time. These predictions are critical for shaping our daily interactions with the world. However, such ability has been substantially missing in modern artificial intelligent systems. In this talk, I will present our efforts towards endowing machine learning systems with predictive learning ability, on the illustrative task of 3D human motion prediction. The core idea is to leverage the rich yet implicit structural dependencies and regularities inherent in motion sequences without any additional supervision, including geometric, temporal, contextual, attentional, and model parameter structures. By incorporating the desired structural knowledge into a deep learning based framework, I will show that we forecast realistic, human-like, and diverse future motion in both short-term and long-term scenarios with significantly less annotated motion capture data. I will also demonstrate the application of our prediction model for human-robot interaction, and further discuss some ongoing work on in-the-wild prediction, with the ultimate goal of building autonomous agents that perceive, interpret, and interact with the dynamic world.
Liangyan Gui is a scientist at Argo AI, LLC. She was a postdoctoral fellow at Carnegie Mellon University. She received a Ph.D. in Electrical and Computer Engineering from Carnegie Mellon University under the supervision of José M. F. Moura in 2019. Previously, she got her M.S. and B.E. in Electronic Engineering from Tsinghua University. Her research interests lie in computer vision, machine learning, and robotics, with a particular focus on predictive learning and combination of geometry-based and learning-based approaches. She has spent time at Google and Facebook Reality Labs.
Faculty Host: Derek Hoiem