As we move towards the creation of embodied agents that understand natural language, several new challenges and complexities arise for grounding (e.g. complex state-spaces), planning (e.g. long horizons), and social interaction (e.g. asking for help or clarifications). In this talk, I'll discuss several recent results both on improvements to embodied instruction following within ALFRED and initial steps towards building agents that ask questions or model theory-of-mind.
Yonatan Bisk is an assistant professor of computer science in Carnegie Mellon's Language Technologies Institute. His group works on grounded and embodied natural language processing, placing perception and interaction as central to how language is learned and understood. This was a windy path, having received his PhD from the University of Illinois at Urbana-Champaign working on unsupervised Bayesian models of syntax he subsequently spent several years as a postdoc and/or visitor at USC's ISI (grounding), the University of Washington (commonsense), and Microsoft Research (vision+language).