Our lives are being rapidly reshaped by the ubiquitous intelligent systems, from autonomous driving to AI assisted medical care, from human identification and surveillance to smart cities, from factory automation to precision agriculture, and even to scientific endeavors such as astronomical missions.
Just like humans heavily rely on their visual sensors to perform day-to-day tasks, accurate visual understanding plays a key role in such intelligent systems. While computer vision has seen remarkable advancement in the era of deep learning recently, accurate visual understanding remains an active research area and a challenging problem.
This talk presents recent research for accurate visual understanding, from images to videos, from object detection to activity recognition, and beyond. Starting with objects, this talk introduces novel approaches that can effectively improve the performance of modern object detectors and set new state-of-the-arts. Followed by activities, this talk presents an object-centric spatio-temporal activity recognition system that ranked top in the NIST/IARPA TRECVID Activity Recognition Challenge. Finally, this talk concludes with a brief glance of accurate visual understanding research in real-world AI applications.