Abstract: Models such as DINO, SAM, CLIP, and Mast3r have turned around the field of robotic scene understanding, providing a leap in capabilities with regards to generalization, semantic understanding, and scene reconstruction. With these better prior models, this talk proposes to rethink the role of mapping in scene understanding and discusses the new frontiers in scene understanding.
Speaker Bio.: Hermann Blum studied Electrical Engineering and Information Technology at ETH Zürich in Switzerland, with a 1 year stay at Imperial College London in the UK. He obtained his PhD from ETH Zürich in 2022 under the advice of Prof. Roland Siegwart. After an internship at Google Research, he joined the lab of Prof. Marc Pollefeys at ETH Zürich as a PostDoc, leading the robotics team. In 2024 he started as a Junior Professor at the University of Bonn, where he leads the Robot Perception and Learning Lab.
Hermann's research focuses on machine learning for robotic perception and scene understanding, developing models and methods to understand an agent's environment semantically and geometrically. He is best known for his public benchmarks on anomaly detection in semantic segmentation, as well as for regular workshops that he organizes in Robotics and Computer Vision conferences.