Center for Global Studies

View Full Calendar

Harsh Agrawal "Towards multi-modal AI systems with 'open-world' cognition"

Event Type
Illinois Computer Science
wifi event
Mar 23, 2023   2:00 pm  
Harsh Agrawal, final year Ph.D. student, Georgia Tech
Candice Steidinger
Originating Calendar
Computer Science Speakers Calendar

We look forward to seeing you online.


A long-term goal in AI research is to build intelligent systems with 'open-world' cognition. When deployed in the wild, AI systems should generalize to novel concepts and instructions. Such an agent would need to perceive both familiar and unfamiliar concepts present in the environment, combine the capabilities of models trained on different modalities, and incrementally acquire new skills to continuously adapt to the evolving world. In this talk, we will look at how we can combine complementary multi-modal knowledge with suitable forms of reasoning to enable novel concept learning. In Part 1, we will show that agents can infer unfamiliar concepts in the presence of other familiar concepts by combining multi-modal knowledge with deductive reasoning. Furthermore, agents can use newly inferred concepts to update their vocabulary of known concepts and infer additional novel concepts incrementally. In Part 2, we will look at how we can use task-dependent augmentations for improving robustness in unseen environments. In part 3, we will look at two realistic tasks that require understanding novel concepts - 1) to evaluate the AI system's capability to describe novel objects present in an image, and 2) to study how embodied agents can combine perception with common-sense knowledge to perform household chores like tidying up the house, without any explicit human instruction, even in the presence of unseen objects in unseen environments. Finally, I will discuss some final future directions that I am interested in.


Harsh Agrawal is a final year Ph.D. student at Georgia Tech advised by Dhruv Batra. He also collaborates closely with Peter Anderson and Natasha Jaques at Google. He has also interned with Gal Chechik at NVIDIA and Marcus Rohrbach at Meta AI Research. His research lies at the intersection of computer vision and natural language processing with a focus on visio-linguistic understanding for Embodied AI. His goal is to build multi-modal agents with 'open-world' cognition -- agents that can reason about novel scenarios, and learn new concepts incrementally in our dynamic physical world. In his free time, he also helps maintain and manage an AI challenge hosting platform called EvalAI (part of CloudCV project) which aims to make AI research more reproducible. Before this, he spent a couple of years as a Research Engineer at Snap Research where he was responsible for building large-scale infrastructure for visual recognition, search, and developed algorithms for low-shot instance detection. He was the Rising Star Doctoral Student at Georgia Tech and his PhD was partially supported by Snap Fellowship.

link for robots only