College of LAS Events

If you will need disability-related accommodations in order to participate, please email the contact person for the event.
Early requests are strongly encouraged to allow sufficient time to meet your access needs.

Master and Undergraduate Research Experience in Statistics (MURES) Symposium

Event Type
Seminar/Symposium
Sponsor
Department of Statistics
Location
Beckman Institute - Rooms 1005 and 3269
Date
Dec 10, 2025   10:00 am  
Views
13
Originating Calendar
Department of Statistics Event Calendar

Master and Undergraduate Research Experience in Statistics (MURES) Symposium 

Dec 10, 2025 

Beckman Institute

10:00 – 10:50

Morning Session I

Room 1005

Morning Session I

Room 3269

 

Chair: Hyoeun Lee

Chair: Kit Clement

10:00 - 10:05

Opening Remark

 

10:05 - 10:20

Benjamin Leidig, Yihan Lin, Jaehyung Kim - Machine Learning and Deep Learning Methods for Electricity Day-Ahead Price Forecasting

Mentor: Hyoeun Lee

Aditi Shrivastava - Examining student narrative preference in probability modeling

 

Mentor: Kit Clement

10:20 - 10:35

Yael Lindenberg - Graph autocorrelation testing for cluster-free differential gene expression

Mentor: Dave Zhao

Jonathan Petro - Establishing validity of narrative identity survey instrument for introductory statistics students

Mentor: Kit Clement

10:35 - 

10:50

Mata Qin, Jiacheng Ye - Spatial Analysis of Blue Jay Spring Counts in Illinois: County-Level Trends and Spatial Dependence

Mentor: Weijia Jia

Jackson Fleege - Predicting use of narrative in probability modeling from students' narrative identity

 

Mentor: Kit Clement

11:00 – 12:00

Morning Session II

Room 1005

Morning Session II

Room 1005

 

Chair: Hyoeun Lee

Chair: Yuexi Wang

11:00 - 11:15

Emma Chen - Analysis of Carbon Tax in Electricity Markets via Machine Learning

Mentor: Gökçe Dayanıklı

Sahana Hariharan, Ivy Gu - Simulation-based Inference: A Predictive Perspective

Mentor: Yuexi Wang

11:15 - 11:30

Yichen Zhou - Neural Network Based Epidemic Control Python Package

Mentor: Gökçe Dayanıklı

Chaerin Kim, Junan Mao - Learning diffusion kernels on the cortical manifold

Mentor: Matthew Singh

11:30 - 11:45

Dongxiao Xie, Joey Li - Microbiome Differential Network Analysis

Mentor: Shulei Wang

Tailei Liu, Yunxi Zeng - Causal Inference in Observational Factorial Studies with Multi-Level Factors

Mentor: Ruoqi Yu

11:45 – 12:00

Rishita Mylavarapu - Modeling the Impact of Heat Waves on Health in India: data gaps and modeling challenges

Mentor: Lelys Bravo

 

12:00 - 13:00

Lunch

Atrium, Beckman Institute


 
 

13:00 – 14:00

Afternoon Session I

Room 1005

Afternoon Session I

Room 3269

 

Chair: V.N. Vimal Rao

Chair: Kit Clement

13:00 - 13:15

Sophia Stierwalt - Are students' attitudes towards statistics related to their perceptions of what statistics is?

 

Mentor: V.N. Vimal Rao

David He - Bayesian Modeling of Blue Jay Spring Counts in Illinois: County-Level Analysis with Survey Effort Adjustment

Mentor: Weijia Jia

13:15 – 13:30

Karena Liang - Exploring the relationship between Big5 personality characteristics and attitudes towards statistics

Mentor: V.N. Vimal Rao

Yi Wang, Zitao Zhang - Virtue of Auxiliary Variables: Approach to Agriculture Data

Mentor: Chan Park

13:30 – 13:45

Preetha Manjunath - Students' Attitudes towards AI in an Introductory Statistics Class

 

Mentor: V.N. Vimal Rao

Luke Thorell, Mariam Vaid - Does Simulation Software Matter? Exploring the Impact of Technology in Simulation-based Inference Curricula

Mentor: Kit Clement

13:45 – 14:00

Samhita Periyanayaham - Exploring statistics-situated tolerance of uncertainty and general tolerance of uncertainty and tolerance of ambiguity

Mentor: V.N. Vimal Rao

Advay Kadam, Ayush Shastry - Solar energy production forecasting for Illinois

 

Mentor: Hyoeun Lee

14:05 – 14:50

Afternoon session II

Room 1005

Afternoon session II

Room 3269

 

Chair: Pamela Martinez

Chair: Kelly Findley

14:05 – 14:20

Joules Apura - Understanding the impact of life history on pathogen evolution

Mentor: Pamela Martinez

Cheng Ai - Learning to code with Generative AI: What could go wrong?

Mentor: Kelly Findley

14:20 – 14:35

Carrie Song - Exploring Seasonal Dynamics and Age-Specific Patterns of Respiratory Diseases

Mentor: Pamela Martinez

Rebecca Shi, Joshua Zhang - Reconstructing Brain Signals from Neurostimulation Data

Mentor: Matthew Singh

14:35 – 14:50

Ethan Chen - Psychometrics of Personality

 

Mentor: Jeff Douglas

Qihao Zhang - Assessing heat waves trends for Boston and Chicago under present and future climate scenarios

Mentor: Lelys Bravo

 

Abstracts

Benjamin Leidig, Yihan Lin, Jaehyung Kim - Machine Learning and Deep Learning Methods for Electricity Day-Ahead Price Forecasting

Mentor: Hyoeun Lee

As socio-economic dependence on electricity prices increases, the ability to anticipate price fluctuations is essential for maintaining system stability. While statistical forecasting methods attempt to address this need, artificial neural networks often fail to capture the innate volatility of electricity prices due to overfitting. Other approaches, including modern machine learning methods and energy system models, often perform better during volatile periods but demonstrate reduced accuracy under typical price fluctuations. To address this challenge, we systematically evaluate the performance of deep learning, machine learning, and data manipulation methodologies across both stable and volatile market conditions. The models we use consider a wide range of external variables, like grid load, cross-border electricity trading volume, and weather, to capture more complex and long-term patterns in prices. Logistical constraints regarding real-time electricity trading–such as prices being set for each hour the next day, all at once on the day before–are carefully considered in our dataset formation. Additionally, techniques like the mirror-logarithmic transformation and target normalization are used for reducing the impact of historical price spikes in model fitting. Ultimately, we compare numerous modern machine learning methods against more complex deep learning models, including a contrast of bias and variance metrics of test set predictions among models.

 Advay Kadam, Ayush Shastry - Solar energy production forecasting for Illinois

Mentor: Hyoeun Lee

Time series forecasting for the consumption of solar energy is crucial for understanding the economic impact of renewable energy systems. This study is especially significant to the mid-western states in the United States, which are moving towards greater renewable energy development. Moreover, this study focuses on forecasting the generation of Alternating Current (AC) in the state of Illinois through the use of traditional machine learning models and deep learning techniques, such as XGBoost, TCN models, LSTM models, and hybrid CNN-LSTM-TF models. We extract hourly AC production data from the National Renewable Energy Laboratory for 2023 and 2024 in Illinois, providing meteorological and energy production data.

 Sahana Hariharan, Ivy Gu - Simulation-based Inference: A Predictive Perspective

Mentor: Yuexi Wang

Simulation-based inference (SBI) is a powerful framework for analyzing statistical models whose likelihood functions are difficult or impossible to compute but from which we can easily generate simulated data. In this project, we study two widely used SBI methods: Approximate Bayesian Computation (ABC) and Neural Posterior Estimation (NPE). Our goal is to understand how these methods behave when the underlying model may be misspecified. That is, when the assumed model does not perfectly match the real data-generating process. We take a predictive perspective, using a two-step procedure that evaluates and calibrates posterior samples based on how well they predict new data. This project will introduce students to modern Bayesian computation, generative modeling, and practical issues in applying SBI to real-world problems.

 Mata Qin, Jiacheng Ye - Spatial Analysis of Blue Jay Spring Counts in Illinois: County-Level Trends and Spatial Dependence

Mentor: Weijia Jia

We analyzed long-term (2000–2024) Spring Bird Count (SBC) data from Illinois to assess population trends of Blue Jays (Cyanocitta cristata). The SBC is an annual census conducted by volunteers on the Saturday between May 4 and May 10 across all 102 Illinois counties. County-level changes in Blue Jay counts were visualized using a Shiny app. Spatial autocorrelation was assessed with Moran’s I, and to account for spatial dependence among neighboring counties, a spatial Conditional Autoregressive (CAR) model was applied, including survey effort and year as covariates. This approach provides robust estimates of county-level population trends while explicitly modeling spatial correlation and allowing interactive exploration of temporal patterns in volunteer-collected count data.

 David He - Bayesian Modeling of Blue Jay Spring Counts in Illinois: County-Level Analysis with Survey Effort Adjustment

Mentor: Weijia Jia

We analyzed long-term (2000–2024) Spring Bird Count (SBC) data from Illinois to assess population trends of Blue Jays (Cyanocitta cristata). The SBC is a yearly bird census conducted by volunteers on the Saturday between May 4 and May 10 across all 102 Illinois counties. To account for variation in survey effort across parties and years, we applied a Bayesian hierarchical Areal Data Model incorporating an effort-adjustment term of the form exp(B·(effort^p−1)/p) within a Markov Chain Monte Carlo (MCMC) framework. Counts were modeled across counties and years to estimate temporal trends while explicitly accounting for spatial dependence. This approach provides robust estimates of Blue Jay population trends while addressing key sources of uncertainty in volunteer-collected, county-level count data.

 Aditi Shrivastava - Examining student narrative preference in probability modeling

Mentor: Kit Clement

Research has shown that simulation-based inference (SBI) curricula deepen students’ understanding of key inferential concepts. This study focuses on an SBI curriculum where students construct models to simulate data generating processes in different scenarios. Students’ use of narrative thinking when building these simulation processes has been a recent focus of research; however there has yet to be research documenting students’ use of narrative across an entire learning trajectory. We track the different models students create across three activities based on survey responses in a statistics class that uses an SBI curriculum and classify them as efficient or narrative based on whether they build a model that reflects common statistical practice, or includes additional narrative elements of the context. Based on the initial analysis, we can see that the number of students who prefer efficient models increases in the later activities than the earlier ones. We can also see that students who start with narrative models, more often than not, switch to efficient models. Through this we can see that activity context and an increase in statistical knowledge affect the kinds of models students create. This has teaching implications for how to introduce novice students to statistical modeling and inference by introducing these key concepts in a way that expects broader narrative perspectives.

 Jonathan Petro - Establishing validity of narrative identity survey instrument for introductory statistics students

Mentor: Kit Clement

Narrative qualities of students’ inferential reasoning are a recent focus in research literature; however, there has not yet been work done to link these approaches to students’ personal narrative identity. To measure narrative identity, we use the Awareness of Narrative Identity Questionnaire (ANIQ). While ANIQ is a validated instrument for different aspects of narrative thinking within a broad range of participants, it is important to verify the internal consistency of ANIQ for introductory statistics students. To confirm this, exploratory factor analysis (EFA) and confirmatory factory analysis (CFA) were used to examine the relatedness of the items within their proposed category. Results from EFA show some items diverging from expected categorizations of certain items, but overall maintains an overall consistency with the majority of items. This suggests that ANIQ is an appropriate instrument to use for measuring narrative identity of introductory statistics students.

 Jackson Fleege - Predicting use of narrative in probability modeling from students' narrative identity

Mentor: Kit Clement

Traditional inference in introductory statistics often lends students to opt for more efficient, procedure-driven approaches. However, using simulation-based inference and probability modeling can cue narrative forms of explanation. The purpose of this study is to determine how students’ use of narrative versus efficient reasoning is predicted by their own narrative identity. Students completed three in-class sampler activities and a post-activity survey measuring narrative versus efficient reasoning. To measure narrative identity, students were also given the Awareness of Narrative Identity Questionnaire (ANIQ). Logistic regression models were fit for each activity to predict students’ use of narrative. Students’ thematic coherence significantly predicted narrative responses in the first activity, but as students progressed through the learning trajectory, this specific construct was not as important in predicting their use of narrative. This suggests that students’ narrative identity is less relevant in their inferential reasoning as they move through the learning trajectory. This has implications for teaching in how students’ thematic coherence can initially impact their approaches and how to best build students’ conceptual understanding of inference.

Luke Thorell, Mariam Vaid - Does Simulation Software Matter? Exploring the Impact of Technology in Simulation-based Inference Curricula

Mentor: Kit Clement

Simulation-based inference is well-established as an effective curriculum for instilling students with a strong conceptual understanding of inference; however, much of the research literature has not differentiated curricula by the simulation software students use. The present study compares student outcomes on the Simulation Understanding in Statistical Inference and Estimation instrument across two curricula: one that uses software that enables students to build their own probability models, and another that uses web-apps with pre-constructed simulations. This comparison revealed that students who built their own simulations were more successful in answering questions that focused on understanding the simulation process. This may imply that software which gives students more control over the simulation process may help further their understanding of simulation and inference itself.

Rishita Mylavarapu - Modeling the Impact of Heat Waves on Health in India: data gaps and modeling challenges

Mentor: Lelys Bravo

India has been severely affected by intense and frequent heat waves in recent years, with events lasting 5 days or more causing many fatalities. Recent studies indicate that India accounted for 20.74 per cent of global heatwave-related deaths between 1990 and 2019. In this work, we present a review of recent studies about this topic and assess the modelling approaches to connect a weather-related hazard like heat waves with mortality rates in India. We discussed potential data gaps and methodological approaches focusing on strengthening data modeling capability to better capture the true human cost of extreme heat.

 Qihao Zhang - Assessing heat waves trends for Boston and Chicago under present and future climate scenarios

Mentor: Lelys Bravo

Several studies suggest that heat waves will become more frequent and intense with global warming. In this research, we analyzed several properties of heat waves for two important urban areas in the US, Boston Metropolitan Area (BMA), and Chicago Metropolitan Area (CMA). Using climate projections data from the Coupled Model Intercomparison Project Phase 6 (CMIP6), we identified and quantified four heat waves properties: duration, frequency, intensity and season length. We compared these properties for the historical simulated data (1950-2014) and future climate projections covering the period (2015-2100). We used CMIP6 data multi-model ensembles with different scenarios of greenhouse gas concentrations or Representative Greenhouse Gas Concentration Pathways (RCPs) and estimated decadal trends using a non-parametric approach. The analysis suggests an increasing significant trend of the heat waves intensity for future climate projections under all RCP scenarios, while heat wave duration, frequency and season length do not show significant trend changes.

 Yi Wang, Zitao Zhang - Virtue of Auxiliary Variables: Approach to Agriculture Data

Mentor: Chan Park

Cover crop adoption has become an increasingly important agricultural practice promoted for soil conservation and long-term sustainability. However, its effect on maize yield remains contested, partly because observational studies often face unmeasured confounding that can bias causal conclusions. In this project, we study the causal effect of cover crop adoption on maize yield while explicitly addressing these challenges. Rather than relying on the assumption of no unmeasured confounding, we employ the Control Outcome Calibration Approach (COCA), which incorporates pre-treatment yield measurements as an auxiliary variable. This framework allows us to estimate both the average treatment effect (ATE) and heterogeneous treatment effects across varying environmental conditions. Using satellite-based, county-level data across the U.S. Midwest, we find no statistically significant evidence that cover crop adoption affects maize yield once unobserved confounding is appropriately accounted for. Importantly, we identify substantial heterogeneity: precipitation emerges as a key effect modifier, with higher-precipitation regions exhibiting more negative estimated effects. These results suggest that the performance of cover cropping practices is closely tied to local climatic conditions, highlighting the need for region-specific agronomic recommendations.

 Cheng Ai - Learning to code with Generative AI: What could go wrong?

Mentor: Kelly Findley

Meaningful engagement in statistics and data science typically requires fluent use of statistical software. But learning to use statistical software, especially coding-based tools, has a steep learning curve for students enrolled in introductory statistics courses. Generative AI tools offer students tremendous opportunities for self-directed learning in statistics, including in the generation of code to visualize and analyze data. However, little is known about how learners of statistics interact with common AI tools to complete coding tasks and make sense of the generated results. In this exploratory study, we interviewed six students completing coding assignments in an introduction to biostatistics who were choosing to use an AI tool to assist them in generating code. We are completing qualitative thematic analysis to understand what advantages and barriers these students faced in completing their assignments this way. Our early results indicate that students’ perception of AI tools as an answer-generator, rather than a conversational partner, may be a significant hindrance to making sense of the code produced and to identify misalignment between the AI response and the task instructions.

 Ethan Chen - Psychometrics of Personality

Mentor: Jeff Douglas

Much of the foundational psychometric work in personality was done at the University of Illinois decades ago, and utilized methods of linear factor analysis. Some theories suggest there are 16 distinct personalities, but don't reconcile the infinite latent space of factor analysis with the notion of finitely many personalities. Here we study a large dataset of 160 items and compare fit of linear item factor analysis models with discrete latent class models. Models are compared both with the BIC and a large cross validation dataset. It is seen that latent class models of around 16 classes fit better than factor analysis models. However, this does not prove the discreteness of personality due to some modeling restrictions in the continuous latent space factor analysis models.

link for robots only