Abstract: In this talk, I will discuss my group’s work on cross-modal embeddings, visual grounding, and the applications of these techniques to vision-language understanding tasks. I will also discuss our work on multi-task and incremental learning. Finally, I will briefly survey the state of the art in the area of lifelong learning from images and text, and talk about key challenges and directions for advancing this field further.