Multi-object tracking (MOT) is an important computer vision task that is highly needed in many industrial applications, such as self-driving, smart city and robotic vision. However, due to unreliable detection and occlusion in real-world scenarios, objects could easily get lost from trackers. In this talk, I will present a novel tracker, TrackletNet tracker (TNT), which exploits the connectivity between tracklets with temporal convolutions. First, we associate consecutive detections into tracklets. Then a tracklet-based graph model is built and the connectivity between tracklets is measured. Finally, we conduct tracking using clustering approaches on the tracklet graph. The proposed TNT shows great capability on different kinds of scenarios, such as human and vehicle tracking, and could also be extended to more unconstrained environments, like unmanned aerial vehicle (UAV) and underwater scenarios. Meanwhile, I will present some extensions of the TNT, including how we convert offline tracking to online tracking, how we transfer 2D tracking to 3D tracking when tracking meets visual odometry and ground plane estimation.
Moreover, I will also discuss three undergoing works beyond tracking as follows.
1) Inverted tracking. Instead of focusing on detection association, I will explain how to invert the tracking procedure and utilize tracking information and prior knowledge on less constrained environments to obtain rough knowledge on the localization.
2) End-to-end embedding. Using cropped bounding box image from detection is commonly used in the appearance embedding. I will show how we can combine distance metric learning on the existing detection framework for feature embedding in an end-to-end manner.
3) 3D human pose tracking and missing pose recovery. Based on the pose tracking, I will show how we deal with the main challenges in the 3D pose estimation, including multi-person, multi-frame, missing