Illinois Mobile App Master Calendar

View Full Calendar

Master and Undergraduate Research Experience in Statistics (MURES) Symposium

Event Type
Seminar/Symposium
Sponsor
Department of Statistics
Location
Beckman Institute - Rooms 1005 and 3269
Date
May 7, 2025   10:00 am  
Views
159
Originating Calendar
Department of Statistics Event Calendar


 

Opening Session (10:00-10:50)

Room 1005

 

 

Chair: Hyoeun Lee

 

10:00 - 10:10

Opening Remark

 

10:10 - 10:25

Zheer Wang, Mohit Singh, Idrees Kudaimi - Era-adjusted baseball statistics: website, software, tech report, and interesting findings

Mentor: Daniel Eck

 

Morning Session (11:00-11:55)

Room 1005

Morning Session (10:35-11:55)

Room 3269

 

Chair: Hyoeun Lee

Chair: Pamela Martinez

10:35 - 10:50

Siddhant Gupta Dimensionality Reduction in Neural Activity Simulations: A Computational Approach Using HNN-Core and Synaptic Weight Modulation

Mentor: Matthew Singh 

Rachel Zhou, Arseniy Titov- Simulation Toolbox for Epidemic Control Models with Mean Field Games

 

Mentor:  Gökçe Dayanıklı

11:00 - 11:15

Sanjana Addanki - How Course Modality and COVID-19 Impacted GPA Trends in LAS STEM and Humanities at UIUC

Mentor: Christopher Kinson

Jeffrey Huang, Yunxi Zeng - Matching Methods for Observational Factorial Studies

 

Mentor: Ruoqi Yu

11:20 - 11:35

Rong Xie, Kejun Sun, Yutao Rao - Bitcoin price prediction using various learning methods

Mentor: Hyoeun Lee

Carrie Song- Understanding the influence of geographical and environmental factors on respiratory disease infections

Mentor: Pamela Martinez

11:40 - 11:55

Leah Decatus-Haddad - Police Budgets and Crime Rates in Urbana

Mentor: Christopher Kinson   

Alyssa Anastasi - Analyzing health seeking behavior in response to respiratory diseases in Illinois

Mentor: Pamela Martinez         

12:00 - 13:00

Lunch

Atrium, Beckman Institute


 
 

13:00 - 13:55

Afternoon Session I

Room 1005

Afternoon Session I

Room 3269

 

Chair: Kit Clement

Chair: Lelys Bravo de Guenni

13:00 - 13:15

Luke Thorell, Elen Huang - Measuring Student Comprehension of the Simulation Process and its Product: An Analysis Contrasting Ownership of Modeling Simulations Versus Using Pre-Built Applets

Mentor: Kit Clement   

William WangZitao Zhang Estimating Causal Effects of Cover Crop Implementation on Crop Yield Using an Instrumental Variable

 

Mentor: Chan Park      

13:20 – 13:35

Madeline HuntSamin Hemani - Is Tolerance of Uncertainty Related to Students' Understanding of Statistics?

Mentor: V.N. Vimal Rao

Tyler HetchBaoyuan Zhou - Extreme Heat Events impacts on Health: An analysis of Emergency Department Visits across the US during years 2018 -2025 

Mentor: Lelys Bravo de Guenni         

13:40 – 13:55

Xiaoshan Huang Bayesian Modeling of Rusty Blackbird Winter Counts in Arkansas

 

Mentor: Weijia Jia      

Jaehoon Jung Assessing the Impact of Weather Constraints on Drone Flyability Using Geostatistical Methods 

Mentor: Lelys Bravo de Guenni         

14:00 – 14:55

Afternoon session II

Room 1005

Afternoon session II

Room 3269

 

Chair: Hyoeun Lee

Chair: Lelys Bravo de Guenni

14:00 – 14:15

Mingqian Wang - Bridging Deep Learning and Symbolic Regression: A Hybrid Approach for Interpretability and Expressiveness

Mentor: Matthew Singh          

Karena Liang - Time Series Modeling of Malaria Cases: Integrating climate and mosquito data through machine learning

Mentor: Lelys Bravo de Guenni         

14:20 – 14:35

Shreyas Talluri The creation of individualized brain models to control brain dynamics in response to tms

 

 

Mentor: Matthew Singh          

Vinayak Bagdi Assessing trends in heat waves intensity, frequency, duration, and season length for present and future climate in two locations: Boston metropolitan area (BMA) and Chicago metropolitan area (CMA) 

Mentor: Lelys Bravo de Guenni         

14:40 – 14:55

Jerry Liang - Queue-Based Load Modeling and Detection of DoS Attacks

Mentor: Georgios Fellouris     

 


 

 Sanjana Addanki - How Course Modality and COVID-19 Impacted GPA Trends in LAS STEM and Humanities at UIUC

Mentor: Christopher Kinson

This study explores how class format (online versus in-person) correlates with grade outcomes in the College of Liberal Arts & Sciences at UIUC. Using over 50,000 course records from 2016 to 2024, we investigate GPA patterns across STEM and Humanities disciplines before and after the COVID-19 pandemic. By transforming letter-grade distributions into average GPA scores and conducting t-tests, we find statistically significant increases in GPAs post-COVID for both STEM and Humanities. Interestingly, online courses in STEM showed a notably higher average GPA than their in-person counterparts post-COVID. Box plots further highlight how certain departments experienced sharper GPA increases than others. In particular, I would like to highlight these differences in disciplines like Chemistry, Italian, and Spanish. These findings suggest that course modality and pandemic-era changes in instruction may have long-lasting effects on academic performance. This research contributes to understanding how instructional shifts can differentially affect academic outcomes across fields.

Rong Xie, Kejun Sun, Yutao Rao - Bitcoin price prediction using various learning methods

Mentor: Hyoeun Lee

This project evaluates the predictability of Bitcoin's returns using daily and weekly frequency observations through the application of traditional methods of forecasting and advanced machine learning techniques. Specifically, the traditional method utilizes the ARMA(2,1)-GARCH (1,1) model under the Student's t distribution to examine the dynamics of daily and weekly logarithmic returns through mean reversion and fat-tail volatility clustering. Although the accuracy of numerical predictions is limited, converting the one-step-ahead conditional mean into a directional signal may significantly improve predictive ability. In addition, more advanced machine learning methods such as random forest, XGBoost and LSTM models also confirm that while precise numerical predictions achieve limited success, forecasting market direction significantly improves when using lower-frequency aggregation. These results suggest the existence of considerable noise in the returns of Bitcoin that makes precise numerical forecasts problematic, but highlight the utility of trend classification models, especially at a weekly frequency, for tactical purposes of trading and risk management.

Luke Thorell, Elen Huang - Measuring Student Comprehension of the Simulation Process and its Product: An Analysis Contrasting Ownership of Modeling Simulations Versus Using Pre-Built Applets

Mentor: Kit Clement

Many studies have advocated for using simulation-based inference (SBI) in introductory statistics courses due to its power in helping students understand statistical inference at a deeper conceptual level. However, little research has investigated the various implementations of SBI in terms of the curricular design and the simulation technology that students use. Applet-based simulations are primarily focused on presenting the product of simulation to students, but their “black box” nature may obscure the process of simulation, thereby limiting students’ understanding. This study aims to compare the effectiveness of two different curricula: one using pre-built applet simulations and the other engaging students in modeling simulations from the ground-up. Students from both curricula responded to an open-ended survey at the end of the semester as part of the Simulation Understanding in Statistical Inference and Estimation (SUSIE) instrument. Qualitative analysis was conducted on survey responses, which were double-coded according to this instrument. Results revealed similarities in the understanding of the products of the simulation; however, students who engaged in building simulations were more likely to have a stronger grasp of the simulation process. These results suggest that building and interacting with the simulation procedure may aid in developing students’ statistical reasoning.

Xiaoshan Huang Bayesian Modeling of Rusty Blackbird Winter Counts in Arkansas

Mentor: Weijia Jia        

We analyzed long-term (1965–2020) Christmas Bird Count (CBC) data to assess Rusty Blackbird (Euphagus carolinus) population trends in Arkansas. To address variable survey effort, we applied a Bayesian hierarchical model with an effort-adjustment term (exp(B·(effort^p−1)/p) within a Markov Chain Monte Carlo (MCMC) framework. Our analysis employed a zero-inflated negative binomial regression to account for both excess zeros and overdispersion in count data. Spatial effects were modeled using latitude and longitude to assess geographic distribution shifts, while controlling for environmental covariates including temperature and forest cover. This approach provides robust trend estimates of bird count by accounting for major sources of uncertainty in wildlife count data.

 William Wang, Zitao Zhang - Estimating Causal Effects of Cover Crop Implementation on Crop Yield Using an Instrumental Variable

Mentor: Chan Park

 Many real-world causal inference questions are answered under the assumption of no unmeasured confounding, meaning that all common causes of both treatment and outcome are accounted for. For example, recent studies have examined the causal effect of cover crop implementation on crop yield in Midwest states using satellite data under this assumption. Their findings suggest that cover crop implementation leads to yield losses. However, because these approaches do not account for potential unmeasured confounders (e.g., farmers' skills or management practices) the conclusions may be biased or even indicate an effect opposite to the true causal effect. To better estimate the causal effect, we employ an instrumental variable (IV) approach, a widely used method in economics, epidemiology, and statistics. In our study, we have hand-collected county-level cover crop incentives as an IV, as we believe it reasonably meets these criteria for IVs. Using this IV, we reanalyzed the satellite data to estimate the average and heterogeneous treatment effects of cover crop implementation on crop yield while accounting for potential unmeasured confounding. Our analysis finds no statistical evidence that over-crop implementation causes yield losses, contrasting with previous studies that relied on the assumption of no unmeasured confounding. Moreover, we find that this effect is strongly related to temperature and solar radiation, indicating substantial regional variation in the effectiveness of cover cropping practices.

Tyler HetchBaoyuan Zhou - Extreme Heat Events impacts on Health: An analysis of Emergency Department Visits across the US during years 2018 -2025

Mentor: Lelys Bravo de Guenni

Extreme heat and cold events are one of the most important causes of climate-related deaths worldwide. According to Chen et al. (2024), five million deaths were attributed to extreme heat and cold globally between 2000-2019. Future climate projections indicate that heat related deaths will increase, and cold related deaths will decrease under warmer climates.  Understanding the impacts of extreme heat on health, and on health services demand would provide a better estimation of future climate-related illness burden. In this project we use data from the Heat and Health tracker from the Center of Disease Control and Prevention (CDC) website and other related data sources to investigate the impact of heat waves and extreme heat events on Emergency Department Visits (EDVs) associated to heat related illnesses after standardizing by population. Data was available at a daily basis at a regional level, and it was spatially aggregated to the 10 Health Department Regions (HDRs) across the US, by using an area weighted average. Maximum daily temperature and daily heat Index, as calculated using the National Weather service approach, were analyzed to understand their seasonal cycle and variability across the HDRs. The potential association between the extreme heat events, as determined by the peak times in maximum temperature and heat index, and the EDV time series were investigated and used in the analysis. Log-linear mixed effect models were fitted to the EDV time series accounting for seasonal effects, the impact of the climate variables and vulnerability factors related to socio-economic conditions. Dependence variability between the response and predictors among regions was accounted for as random effects in the proposed models. Prediction errors and goodness of fit were also assessed to evaluate model performance. 

Jaehoon Jung Assessing the Impact of Weather Constraints on Drone Flyability Using Geostatistical Methods 

Mentor: Lelys Bravo de Guenni

 Aerial drone operations have become increasingly crucial in diverse industries and fields across logistics, agriculture and military, yet there no established system exists of verifying the appropriate weather condition for those operation. In this study, we mainly refer to the Weather Constraints on Global Drone Flyability (Mozhou Gao, 2021) and project its findings to a specific local region, North Korea. We passed daily meteorological data from the Korean Meteorology Administration (2015-2024) to a decision tree model that classifies the flyable days when temperature, windspeed and precipitation fall within drone operating condition. We then interpolate the values between 27 observation posts across the region. During the spatial interpolation, or the kriging process, we set the mean trend as a linear combination of the longitude, latitude and the altitude and identify the variance of the differences in the values. The results from this study show how to utilize meteorological data such as temperature and wind speed and transform it into useful information of drone flyability, even in specific areas that do not have nearby weather stations. We expect this research to become a practical index for the decision-making process of planning drone operations in previously unexplored areas.

Karena Liang - Time Series Modeling of Malaria Cases: Integrating climate and mosquito data through machine learning

Mentor: Lelys Bravo de Guenni

 Entomological surveillance is very important in tropical remote areas where malaria is an endemic disease. However, data collection on mosquito abundance is costly and demanding especially in remote regions, due to logistic considerations and lack of consistency in the collection methods. The availability of long time series on the number of different mosquito species present in a particular location would improve understanding of the seasonal fluctuations and biting behavior of the different mosquito vector species transmitting the malaria parasites. In this work, we use machine learning approaches for missing data imputation in the estimated mean number of mosquitoes during a data collection period during 2010-2016. We compare the  different imputation methods using a leave-one-out cross-validation approach. A correction method based on the significance of seasonal effects on mosquito populations was implemented to maintain balanced data in mosquito population counts between trap and human caught methods. We propose a generalized time series model for predicting the number of malaria cases as a function of climate drivers and mosquito abundance after data imputation. Model predictions aim to provide an analytic tool to estimate the incidence of the disease conditioned on several environmental factors.

 Jeffrey Huang, Yunxi Zeng - Matching Methods for Observational Factorial Studies

Mentor: Ruoqi Yu

When implementing a randomized controlled trial is impractical, it is often necessary to use data from observational studies. In practice, such data suffers from issues of confounding among covariates, which introduces bias to the analysis. Matching is a popular method used to achieve covariate balance in observational data in order to make unbiased causal estimates. In observational factorial studies, researchers are interested in estimating both the main and interaction effects of multiple treatments simultaneously. A commonly used method is to balance the covariates on a continuous weight measure, which could be a generalized propensity score or some covariate basis functions.  Matching methods are less discussed in the literature for designing observational factorial studies. We, therefore, propose a novel matching method using mixed integer programming and robust balancing constraints derived from the basis functions of covariates and the factors themselves as is necessary in multi-factor settings.

 Mingqian Wang - Bridging Deep Learning and Symbolic Regression: A Hybrid Approach for Interpretability and Expressiveness

Mentor: Matthew Singh

 In current times, popular machine learning models, especially deep learning, achieve excellent performance in predictive accuracy but often lack interpretability due to their black-box nature. On the other hand, classical symbolic regression, such as genetic programming, suffers from dealing with high-dimensional or noisy datasets. To solve these two issues, we aim to develop symbolic learning architecture to extract mathematical relationships from data in an interpretable and structured manner via deep learning techniques. In fact, some recent attempts to integrate deep learning techniques with symbolic regression have achieved meaningful progress. EQL networks integrate symbolic regression directly into a neural network by using simple mathematical operations (e.g. plus, log) as specialized activation functions. Inspired by the Kolmogorov-Arnold representation theorem, KANs replace traditional weight parameters in MLP with learnable univariate functions. Inspired by their work, we may design a new architecture by replacing KAN’s univariate splines with EQL-like symbolic functions while keeping KAN’s architecture. By doing this, we not only keep Kan's ability to learn hierarchical compositions of functions but also avoid overly complex spline representations. Hopefully, the integration of deep learning with symbolic regression would generate a more interpretable and expressive model to help scientists find the relationship hidden under data.

 Shreyas Talluri The creation of individualized brain models to control brain dynamics in response to tms

 Mentor: Matthew Singh          

 This study aimed to create individualized brain models to predict brain dynamics in response to transcranial magnetic stimulation (TMS). The ability to do this can greatly benefit precision healthcare. Using a dataset of MRI and fMRI scans, I analyzed resting-state and task-based brain activity across subjects. Electroencephalography (EEG) and TMS stimulation were applied, and the resulting brain activity was recorded to capture dynamic responses. Data preprocessing included filtering faulty EEG channels, interpolating surrounding data, and rejecting artifacts such as blinking. By leveraging MATLAB scripts, I utilized a robust pipeline for data cleaning and interpolation, ensuring high-quality input for modeling. The individualized brain models were constructed by integrating multimodal neuroimaging data with computational techniques to estimate neural responses to TMS. These models provide insights into subject-specific brain dynamics and offer potential applications in optimizing TMS protocols for personalized therapeutic interventions.

Zheer Wang, Mohit Singh, Idrees Kudaimi - Era-adjusted baseball statistics: website, software, tech report, and interesting findings

Mentor: Daniel Eck

 This presentation showcases a series of projects centered around Full House Modeling (FHM), a novel framework for era-adjusted baseball statistics. The student began by scraping and processing 2024 season data using FHM scripts, contributing metrics to both the public-facing website and the newly developed fullhouse R package, complete with documentation. Using these tools, the several analyses were conducted: (1) a comparison of baseball eras, showing that Dead Ball Era pitching dominance largely vanishes when properly adjusted; (2) an era-adjusted breakdown of a hypothetical “Crosstown Classic” between the 2005 White Sox and 2016 Cubs, which tilts in favor of the Cubs; and (3) a deep dive into Juan Soto’s career, revealing that his first seven seasons rival the greatest stretches in MLB history when viewed through the lens of era-adjustment. Importantly, his era-adjusted OBP is within a single point of Ted Williams. Soto’s recent $750 million contract, then, may not be as exorbitant as it seems, especially as the Mets' bold acquisition potentially shifts New York’s baseball center of gravity away from the Yankees.


link for robots only