Computer Vision Seminar Series: Dr. Shuyang (Kevin) Sun, "D4RT: Teaching AI to see the world in four dimensions."

- Sponsor
- Illinois Computer Vision
- Speaker
- Dr. Shuyang (Kevin) Sun
- Contact
- Yao Xiao
- yaox11@illinois.edu
- Views
- 49
- Originating Calendar
- Siebel School Speakers Calendar
Abstract: Understanding and reconstructing the complex geometry and motion of dynamic scenes from video remains a formidable challenge in computer vision. This paper introduces D4RT, a simple yet powerful feedforward model designed to efficiently solve this task. D4RT utilizes a unified transformer architecture to jointly infer depth, spatio-temporal correspondence, and full camera parameters from a single video. Its core innovation is a novel querying mechanism that sidesteps the heavy computation of dense, per-frame decoding and the complexity of managing multiple, task-specific decoders. Our decoding interface allows the model to independently and flexibly probe the 3D position of any point in space and time. The result is a lightweight and highly scalable method that enables remarkably efficient training and inference. We demonstrate that our approach sets a new state of the art, outperforming previous methods across a wide spectrum of 4D reconstruction tasks.
Speaker Bio.: Shuyang (Kevin) Sun (https://scholar.google.com/citations?user=PoAvGRMAAAAJ&hl=en) is a Research Scientist at Google DeepMind. His research background spans computer vision, visual perception and understanding. Currently, he focuses on advancing open-world unified visual perception, including recent contributions to 4D spatio-temporal reconstruction and simulation. Prior to joining DeepMind, Shuyang was a Research Scientist at ByteDance. He earned his Ph.D. from the University of Oxford under the supervision of Professor Philip Torr.