Abstract: Using pretrained representations has become ubiquitous in computer vision and natural language processing. Pretraining unlocks better generalization, significant improvements in downstream tasks, and faster convergence. In image synthesis, however, the merits of pretrained representation remain underutilized. In our work, we explore how to leverage pretrained networks to shape GANs into causal structures (“Counterfactual Generative Networks”), train them up to 40 times faster (“Projected GANs”), and achieve state-of-the-art image synthesis at scale (“StyleGAN-XL”, “StyleGAN-T”).
Bio: Axel Sauer is a final year Ph.D. student at the Autonomous Vision Group at the University of Tübingen with Prof. Andreas Geiger. He’s working in computer vision, graphics, and robotics with a focus on advanced methods for image and video synthesis. He is known for his recent work on scaling StyleGAN, making them faster, and combining them with NLP modules to generate high-quality images from text. He obtained his MSc and BSc degrees from Karlsruhe Institute of Technology. He was also a visiting researcher at ETH Zurich, TUM, MBZUAI, and a research intern at NVIDIA.