Join us for the QCB Seminar featuring Rong Ma, Harvard
Modern Nonlinear Embedding Methods Unpacked: Empowering Biological Discoveries with Statistical Insights
Abstract: Learning and representing low-dimensional structures from noisy, high-dimensional data is a cornerstone of modern biomedical data science. Stochastic neighbor embedding algorithms, a family of nonlinear dimensionality reduction and data visualization methods, with t-SNE and UMAP as two leading examples, have become especially influential in recent years, particularly in single-cell analysis. Yet despite their popularity, these methods remain subject to points of debate, including limited theoretical understanding, ambiguous interpretations, and sensitivity to tuning parameters. In this talk, I will present our recent efforts to decipher, demystify, and improve these nonlinear embedding approaches. Our key results include a rigorous theoretical framework that uncovers the intrinsic mechanisms, large-sample limits, and fundamental principles underlying these algorithms; a set of theory-informed practical guidelines for their principled use in trustworthy biological discovery; and a collection of new algorithms that address current limitations and improve performance in areas such as bias reduction and stability. Throughout the talk, I will highlight how these advances not only deepen our statistical understanding but also open new avenues for biological insight.