Title: Vision and Language Learning: from 2D to 3D, and from Data supervision to Data generation
Abstract: Visual grounding has become one of the fundamental building blocks for many vision and language tasks. In this talk, we will first discuss the recent trend of visual grounding, such as, moving from 2D images to 3D complex scenes, and then introduce our recent work in automatically generating data by (1) employing an inverse task and (2) leveraging pre-trained language models (PLM).
Bio: Dr. Liwei Wang is an Assistant Professor in the CSE department at CUHK. Before he joined CUHK in Dec 2020, he had worked in Tencent AI Lab at Bellevue, WA, for more than two years, leading multiple vision and language projects. He got his Ph.D. from CS at UIUC, advised by Prof. Svetlana Lazebnik. At CUHK, he is now leading the Language and Vision Lab (LaVi Lab). The main research focus of his lab is to build interactive AI systems that can not only understand and recreate the visual world but also communicate like human beings using natural language.