Title: Data Integration in Single Cell and Spatial Omics: What is Erased, and Can you Recover it?
Abstract: In high-throughput biological experiments, data integration is a ubiquitous and foundational challenge in all downstream analyses. This talk will dissect these challenges in single-cell and spatial omics, where aligning cells across samples and data modalities is crucial to current analysis pipelines. We will categorize data integration on a spectrum from weak to strong linkage. Weak linkage arises when integrating data with few shared features, such as single-cell RNA sequencing data and spatial proteomics data. For this, I will present MaxFuse (Chen, Zhu et al., 2024), a method that leverages all features to achieve accurate integration. Strong linkage occurs when integrating the same modality, such as single-cell RNA sequencing across batches. Currently, no clear guidelines exist to distinguish biological signals from batch effects, leading to trial-and-error approaches. I will provide evidence that existing paradigms are overly aggressive, erasing meaningful biological variation. I will introduce CellANOVA (Zhang et al., 2024), a novel model and algorithm that uses experimental design principles to recover biological signals lost during integration.
Shuxiao Chen, Bokai Zhu et al., Integration of spatial and single-cell data across modalities with weakly linked features. Nature Biotechnology 42, 1096-1106 (2024)
Zhaojun Zhang et al., Recovery of biological signals lost in single-cell batch integration with CellANOVA. Nature Biotechnology, accepted in principle (2024)