Title: Some statistical methods for the analysis of neural data, and pandemics.
Abstract
In modern scientific setups we are faced with unprecedented challenges regarding how to process data efficiently and in a robust way. These often reveal the brittleness of our current tools, dictating the need for new methods and motivating new questions. In this talk I will give two examples of how the pressure of formidable scientific problems necessitate the deployment of new statistical methods.
First, I will focus on my contribution to NeuroPAL, a new breakthrough technology that enables the colorful imaging of every single neuron in the brain of the C.elegans worm. I will describe new methods for two difficult tasks arising in these datasets; neural segmentation and identification, where classical methods fall short. Behind these methods there is a key statistical physics principle, the so-called Schrödinger bridge, a ‘thought experiment’ that realizes the solution of an entropy-regularized optimal transport problem. This thought experiment was proposed in 1932 but it has yet to percolate into the mainstream of statistics. I will comment on the statistical (sample complexity) benefits of entropic optimal transport and how a loss function based on this principle is a better optimization objective than the log-likelihood for clustering, reducing pathologies such as bad local optima and inconsistency. In consequence, a new algorithm derived from this loss, Sinkhorn EM, attains better, more robust neural segmentation performance. Then, I will comment on how these principles can be used for probabilistically identifying neurons in C.elegans, leading to meaningful uncertainty quantification on this hard combinatorial setup. I will further comment on how these novel methods have proven their usefulness in other contexts such as deep learning.
Second, I will present some of my work on a recent pressing problem, the analysis of the true impact of the ongoing pandemic. My work (joint with Pamela Martinez) offers a comprehensive narrative on the different pathways mediating the relation between socioeconomic status and worse outcomes. Here, the main challenge is to eliminate the biases introduces by the reporting mechanisms. I show how the use of deconvolution models and hierarchical Bayesian models can be used to obtain more reliable understanding on the transmission dynamics and their effects on different populations.