College of LAS Events

View Full Calendar

If you will need disability-related accommodations in order to participate, please email the contact person for the event.
Early requests are strongly encouraged to allow sufficient time to meet your access needs.

Illinois - Purdue Joint Statistics Seminar

Event Type
Ceremony/Service
Sponsor
Illinois Department of Statistics - Purdue Department of Statistics
Location
Danville Country Club, 2718 Denmark Road, Danville, IL
Date
Apr 9, 2022   9:15 am - 5:15 pm  
Views
77
Originating Calendar
Department of Statistics Event Calendar

UIUC and Purdue Statistics Joint Seminar (Central Standard Time)

Time:  9:00am CST – 5:30pm CST, April 9, Saturday

Location:  Danville Country Club, 2718 Denmark Road, Danville, IL

(Kickapoo State Park Recreation Center, 10906 Kickapoo Park Rd. Oakwood IL for afternoon activities)

9:00am – 9:15am Arrive at Danville Country Club

9:15am – 9:30am  Welcome and Opening remarks (Profs Bo Li and Dennis Lin)

9:30am – 11:00am  Seminar Session I

9:30am – 10:00am Talk A:  Predictions, Role of Interventions and the Crisis of Virus in India: A Data Science Call to Arms, Prof. Bhramar Mukherjee, Biostatistics Department, University of Michigan

10:00am – 10:30am Talk B: Self-Supervised Metric Learning in Multi-View Data: A Downstream Task Perspective, Prof. Shulei Wang, Statistics Department, UIUC

10:30am – 11:00am Talk C: A Cross Validation Framework for Signal Denoising, Prof. Sabyasachi Chatterjee, Statistics Department, UIUC

11:00am – 11:15am Coffee Break

11:15am – 12:45pm Seminar Session II

11:10am – 11:45am Talk D: Towards Optimal Rerandomization: a Reconciliation between Randomized and Optimal Designs, Prof. Xinran Li, Statistics Department, UIUC

11:45am – 12:15pm Talk E: Bayesian Inference on High-dimensional Multivariate Binary Data, Prof. Antik Chakraborty, Statistics Department, Purdue University

12:15pm – 12:45pm Talk F: A Nonparametric Regression Alternative to Empirical Bayes Methods, Prof. Dave Zhao, Statistics Department, UIUC

12:45pm – 2:00pm Lunch Buffet, Danville Country Club

2:00pm – 2:30pm Travel to Kickapoo State Park Recreation Center

2:30pm – 5:30pm Outdoor activities Hiking, Boating, Fishing etc.

Websites:

https://www.golfdanvillecc.com/

https://www2.illinois.gov/dnr/Parks/Pages/Kickapoo.aspx

Title and Abstracts

Talk A: Professor Bhramar Mukherjee, Biostatistics Department, University of Michigan

Title: Predictions, Role of Interventions and the Crisis of Virus in India: A Data Science Call to Arms

Abstract: India, world's largest democracy, had three very different surges of SARS-CoV-2 in 2020,2021 and 2022 corresponding to the transmission of the ancestral strain, the rise of the Delta variant and the final Omicron wave. The human behavior and public health intervention strategies were also very different during these three waves. In this presentation, we provide a brief chronicle of the modeling experience of our study team over the last two years,  looking at the data from India, leading to the development of a tiered data-driven framework for public health interventions towards pandemic resilience. Through mathematical modeling we study the timing and duration of public health interventions with intervention effects estimated from actual data. We illustrate that early and sustained interventions can help us avoid harsh lockdowns and reduce COVID mortality drastically. We also quantify the estimated number of missing COVID-deaths in India which are orders of magnitude larger than reported COVID-deaths. This is joint work with many, with all supporting research materials and products available at covind19.org.

Talk B: Professor Shulei Wang, Statistics Department, University of Illinois Urbana-Champaign
Title: Self-Supervised Metric Learning in Multi-View Data: A Downstream Task Perspective
Abstract: Self-supervised metric learning has been a successful approach for learning a distance from an unlabeled dataset. The resulting distance is broadly useful for improving various distance-based downstream tasks, even when no information from downstream tasks is utilized in the metric learning stage. To gain insights into this approach, we develop a statistical framework to theoretically study how self-supervised metric learning can benefit downstream tasks in the context of multi-view data. Under this framework, we show that the target distance of metric learning satisfies several desired properties for the downstream tasks. On the other hand, our investigation suggests the target distance can be further improved by moderating each direction’s weights. In addition, our analysis precisely characterizes the improvement by self-supervised metric learning on four commonly used downstream tasks: sample identification, two-sample testing, k-means clustering, and k-nearest neighbor classification. When the distance is estimated from an unlabeled dataset, we establish the upper bound on distance estimation’s accuracy and the number of samples sufficient for downstream task improvement.

Talk C: Professor Sabyasachi Chatterjee, Statistics Department, University of Illinois Urbana-Champaign

Title: A Cross Validation Framework for Signal Denoising
Abstract: In the talk, I will explain a general and theoretically tractable K fold CV framework for fixed design regression/signal denoising methods. I will consider Trend Filtering, a popular nonparametric regression procedure, as a running example. The resulting cross validated version of Trend Filtering provably (nearly) attains the same rates of convergence known for the optimally tuned version. No theoretical analysis existed for a cross validated version of Trend Filtering before this work.

Talk D: Professor Xinran Li, Statistics Department, University of Illinois Urbana-Champaign
Title: Towards optimal rerandomization: a reconciliation between randomized and optimal designs
Abstract: Completely randomized experiments have been the gold standard for drawing causal inference because they can balance all potential confounding on average. However, they can often suffer from unbalanced covariates for realized treatment assignments. Rerandomization, a design that rerandomizes the treatment assignment until a prespecified covariate balance criterion is met, has recently got attention due to its easy implementation, improved covariate balance and more efficient inference. Researchers have then suggested to use the assignments that minimize the covariate imbalance, namely the optimally balanced design. This has caused again the long-time controversy between two philosophies for designing experiments: randomization versus optimal and thus almost deterministic designs. Existing literature argued that rerandomization with overly balanced observed covariates can lead to highly imbalanced unobserved covariates, making it vulnerable to model misspecification. On the contrary, rerandomization with properly balanced covariates can provide robust inference for treatment effects while sacrificing some efficiency compared to the ideally optimal design. In this paper, we show it is possible that, by making the covariate imbalance diminishing at a proper rate as the sample size increases, rerandomization can achieve its ideally optimal precision that one can expect with perfectly balanced covariates while still maintaining its robustness. In particular, we provide the sufficient and necessary condition on the number of covariates for achieving the desired optimality. Our results rely on a more dedicated asymptotic analysis for rerandomization. The derived theory for rerandomization provides a deeper understanding of its large-sample property and can better guide its practical implementation. Furthermore, it also helps reconcile the controversy between randomized and optimal designs.

Talk E: Professor Antik Chakraborty, Statistics Department, Purdue University

Title: Bayesian inference on high-dimensional multivariate binary data

Abstract: It has become increasingly common to collect high-dimensional binary data; for example, with the emergence of new sampling techniques in ecology.  In smaller dimensions, multivariate probit (MVP) models are routinely used for inferences.  However, algorithms for fitting such models face issues in scaling up to high dimensions due to the intractability of the likelihood, involving an integral over a multivariate normal distribution having no analytic form.  Although a variety of algorithms have been proposed to approximate this intractable integral, these approaches are difficult to implement and/or inaccurate in high dimensions. We propose a two-stage Bayesian approach for inference on model parameters while taking care of uncertainty propagation between the stages. We use the special structure of latent Gaussian models to reduce the highly expensive computation involved in joint parameter estimation to focus inference on marginal distributions of model parameters. This essentially makes the method embarrassingly parallel for both stages. We illustrate performance in simulations and applications to joint species distribution modeling in ecology.

Talk F: Professor Dave Zhao, Statistics Department, University of Illinois Urbana-Champaign
Title: A nonparametric regression alternative to empirical Bayes methods
Abstract: Empirical Bayes methods are becoming more and more popular in many areas of statistics. We provide an entirely frequentist formulation of empirical Bayes problems and describe nonparametric regression methods to solve them.

link for robots only