| Opening Session (10:00-10:50) Room 1005 | |
| Chair: Hyoeun Lee | |
10:00 - 10:10 | Opening Remark | |
10:10 - 10:25 | Zheer Wang, Mohit Singh, Idrees Kudaimi - Era-adjusted baseball statistics: website, software, tech report, and interesting findings Mentor: Daniel Eck |
| Morning Session (11:00-11:55) Room 1005 | Morning Session (10:35-11:55) Room 3269 |
| Chair: Hyoeun Lee | Chair: Pamela Martinez |
10:35 - 10:50 | Siddhant Gupta - Dimensionality Reduction in Neural Activity Simulations: A Computational Approach Using HNN-Core and Synaptic Weight Modulation Mentor: Matthew Singh | Rachel Zhou, Arseniy Titov- Simulation Toolbox for Epidemic Control Models with Mean Field Games Mentor: Gökçe Dayanıklı |
11:00 - 11:15 | Sanjana Addanki - How Course Modality and COVID-19 Impacted GPA Trends in LAS STEM and Humanities at UIUC Mentor: Christopher Kinson | Jeffrey Huang, Yunxi Zeng - Matching Methods for Observational Factorial Studies Mentor: Ruoqi Yu |
11:20 - 11:35 | Rong Xie, Kejun Sun, Yutao Rao - Bitcoin price prediction using various learning methods Mentor: Hyoeun Lee | Carrie Song- Understanding the influence of geographical and environmental factors on respiratory disease infections Mentor: Pamela Martinez |
11:40 - 11:55 | Leah Decatus-Haddad - Police Budgets and Crime Rates in Urbana Mentor: Christopher Kinson | Alyssa Anastasi - Analyzing health seeking behavior in response to respiratory diseases in Illinois Mentor: Pamela Martinez |
12:00 - 13:00 | Lunch Atrium, Beckman Institute |
13:00 - 13:55 | Afternoon Session I Room 1005 | Afternoon Session I Room 3269 |
| Chair: Kit Clement | Chair: Lelys Bravo de Guenni |
13:00 - 13:15 | Luke Thorell, Elen Huang - Measuring Student Comprehension of the Simulation Process and its Product: An Analysis Contrasting Ownership of Modeling Simulations Versus Using Pre-Built Applets Mentor: Kit Clement | William Wang, Zitao Zhang - Estimating Causal Effects of Cover Crop Implementation on Crop Yield Using an Instrumental Variable Mentor: Chan Park |
13:20 – 13:35 | Madeline Hunt, Samin Hemani - Is Tolerance of Uncertainty Related to Students' Understanding of Statistics? Mentor: V.N. Vimal Rao | Tyler Hetch, Baoyuan Zhou - Extreme Heat Events impacts on Health: An analysis of Emergency Department Visits across the US during years 2018 -2025 Mentor: Lelys Bravo de Guenni |
13:40 – 13:55 | Xiaoshan Huang - Bayesian Modeling of Rusty Blackbird Winter Counts in Arkansas Mentor: Weijia Jia | Jaehoon Jung - Assessing the Impact of Weather Constraints on Drone Flyability Using Geostatistical Methods Mentor: Lelys Bravo de Guenni |
14:00 – 14:55 | Afternoon session II Room 1005 | Afternoon session II Room 3269 |
| Chair: Hyoeun Lee | Chair: Lelys Bravo de Guenni |
14:00 – 14:15 | Mingqian Wang - Bridging Deep Learning and Symbolic Regression: A Hybrid Approach for Interpretability and Expressiveness Mentor: Matthew Singh | Karena Liang - Time Series Modeling of Malaria Cases: Integrating climate and mosquito data through machine learning Mentor: Lelys Bravo de Guenni |
14:20 – 14:35 | Shreyas Talluri - The creation of individualized brain models to control brain dynamics in response to tms Mentor: Matthew Singh | Vinayak Bagdi - Assessing trends in heat waves intensity, frequency, duration, and season length for present and future climate in two locations: Boston metropolitan area (BMA) and Chicago metropolitan area (CMA) Mentor: Lelys Bravo de Guenni |
14:40 – 14:55 | Jerry Liang - Queue-Based Load Modeling and Detection of DoS Attacks Mentor: Georgios Fellouris | |
Sanjana Addanki - How Course Modality and COVID-19 Impacted GPA Trends in LAS STEM and Humanities at UIUC
Mentor: Christopher Kinson
This study explores how class format (online versus in-person) correlates with grade outcomes in the College of Liberal Arts & Sciences at UIUC. Using over 50,000 course records from 2016 to 2024, we investigate GPA patterns across STEM and Humanities disciplines before and after the COVID-19 pandemic. By transforming letter-grade distributions into average GPA scores and conducting t-tests, we find statistically significant increases in GPAs post-COVID for both STEM and Humanities. Interestingly, online courses in STEM showed a notably higher average GPA than their in-person counterparts post-COVID. Box plots further highlight how certain departments experienced sharper GPA increases than others. In particular, I would like to highlight these differences in disciplines like Chemistry, Italian, and Spanish. These findings suggest that course modality and pandemic-era changes in instruction may have long-lasting effects on academic performance. This research contributes to understanding how instructional shifts can differentially affect academic outcomes across fields.
Rong Xie, Kejun Sun, Yutao Rao - Bitcoin price prediction using various learning methods
Mentor: Hyoeun Lee
This project evaluates the predictability of Bitcoin's returns using daily and weekly frequency observations through the application of traditional methods of forecasting and advanced machine learning techniques. Specifically, the traditional method utilizes the ARMA(2,1)-GARCH (1,1) model under the Student's t distribution to examine the dynamics of daily and weekly logarithmic returns through mean reversion and fat-tail volatility clustering. Although the accuracy of numerical predictions is limited, converting the one-step-ahead conditional mean into a directional signal may significantly improve predictive ability. In addition, more advanced machine learning methods such as random forest, XGBoost and LSTM models also confirm that while precise numerical predictions achieve limited success, forecasting market direction significantly improves when using lower-frequency aggregation. These results suggest the existence of considerable noise in the returns of Bitcoin that makes precise numerical forecasts problematic, but highlight the utility of trend classification models, especially at a weekly frequency, for tactical purposes of trading and risk management.
Luke Thorell, Elen Huang - Measuring Student Comprehension of the Simulation Process and its Product: An Analysis Contrasting Ownership of Modeling Simulations Versus Using Pre-Built Applets
Mentor: Kit Clement
Many studies have advocated for using simulation-based inference (SBI) in introductory statistics courses due to its power in helping students understand statistical inference at a deeper conceptual level. However, little research has investigated the various implementations of SBI in terms of the curricular design and the simulation technology that students use. Applet-based simulations are primarily focused on presenting the product of simulation to students, but their “black box” nature may obscure the process of simulation, thereby limiting students’ understanding. This study aims to compare the effectiveness of two different curricula: one using pre-built applet simulations and the other engaging students in modeling simulations from the ground-up. Students from both curricula responded to an open-ended survey at the end of the semester as part of the Simulation Understanding in Statistical Inference and Estimation (SUSIE) instrument. Qualitative analysis was conducted on survey responses, which were double-coded according to this instrument. Results revealed similarities in the understanding of the products of the simulation; however, students who engaged in building simulations were more likely to have a stronger grasp of the simulation process. These results suggest that building and interacting with the simulation procedure may aid in developing students’ statistical reasoning.
Xiaoshan Huang - Bayesian Modeling of Rusty Blackbird Winter Counts in Arkansas
Mentor: Weijia Jia
We analyzed long-term (1965–2020) Christmas Bird Count (CBC) data to assess Rusty Blackbird (Euphagus carolinus) population trends in Arkansas. To address variable survey effort, we applied a Bayesian hierarchical model with an effort-adjustment term (exp(B·(effort^p−1)/p) within a Markov Chain Monte Carlo (MCMC) framework. Our analysis employed a zero-inflated negative binomial regression to account for both excess zeros and overdispersion in count data. Spatial effects were modeled using latitude and longitude to assess geographic distribution shifts, while controlling for environmental covariates including temperature and forest cover. This approach provides robust trend estimates of bird count by accounting for major sources of uncertainty in wildlife count data.
William Wang, Zitao Zhang - Estimating Causal Effects of Cover Crop Implementation on Crop Yield Using an Instrumental Variable
Mentor: Chan Park
Many real-world causal inference questions are answered under the assumption of no unmeasured confounding, meaning that all common causes of both treatment and outcome are accounted for. For example, recent studies have examined the causal effect of cover crop implementation on crop yield in Midwest states using satellite data under this assumption. Their findings suggest that cover crop implementation leads to yield losses. However, because these approaches do not account for potential unmeasured confounders (e.g., farmers' skills or management practices) the conclusions may be biased or even indicate an effect opposite to the true causal effect. To better estimate the causal effect, we employ an instrumental variable (IV) approach, a widely used method in economics, epidemiology, and statistics. In our study, we have hand-collected county-level cover crop incentives as an IV, as we believe it reasonably meets these criteria for IVs. Using this IV, we reanalyzed the satellite data to estimate the average and heterogeneous treatment effects of cover crop implementation on crop yield while accounting for potential unmeasured confounding. Our analysis finds no statistical evidence that over-crop implementation causes yield losses, contrasting with previous studies that relied on the assumption of no unmeasured confounding. Moreover, we find that this effect is strongly related to temperature and solar radiation, indicating substantial regional variation in the effectiveness of cover cropping practices.
Tyler Hetch, Baoyuan Zhou - Extreme Heat Events impacts on Health: An analysis of Emergency Department Visits across the US during years 2018 -2025
Mentor: Lelys Bravo de Guenni
Extreme heat and cold events are one of the most important causes of climate-related deaths worldwide. According to Chen et al. (2024), five million deaths were attributed to extreme heat and cold globally between 2000-2019. Future climate projections indicate that heat related deaths will increase, and cold related deaths will decrease under warmer climates. Understanding the impacts of extreme heat on health, and on health services demand would provide a better estimation of future climate-related illness burden. In this project we use data from the Heat and Health tracker from the Center of Disease Control and Prevention (CDC) website and other related data sources to investigate the impact of heat waves and extreme heat events on Emergency Department Visits (EDVs) associated to heat related illnesses after standardizing by population. Data was available at a daily basis at a regional level, and it was spatially aggregated to the 10 Health Department Regions (HDRs) across the US, by using an area weighted average. Maximum daily temperature and daily heat Index, as calculated using the National Weather service approach, were analyzed to understand their seasonal cycle and variability across the HDRs. The potential association between the extreme heat events, as determined by the peak times in maximum temperature and heat index, and the EDV time series were investigated and used in the analysis. Log-linear mixed effect models were fitted to the EDV time series accounting for seasonal effects, the impact of the climate variables and vulnerability factors related to socio-economic conditions. Dependence variability between the response and predictors among regions was accounted for as random effects in the proposed models. Prediction errors and goodness of fit were also assessed to evaluate model performance.
Jaehoon Jung - Assessing the Impact of Weather Constraints on Drone Flyability Using Geostatistical Methods
Mentor: Lelys Bravo de Guenni
Aerial drone operations have become increasingly crucial in diverse industries and fields across logistics, agriculture and military, yet there no established system exists of verifying the appropriate weather condition for those operation. In this study, we mainly refer to the Weather Constraints on Global Drone Flyability (Mozhou Gao, 2021) and project its findings to a specific local region, North Korea. We passed daily meteorological data from the Korean Meteorology Administration (2015-2024) to a decision tree model that classifies the flyable days when temperature, windspeed and precipitation fall within drone operating condition. We then interpolate the values between 27 observation posts across the region. During the spatial interpolation, or the kriging process, we set the mean trend as a linear combination of the longitude, latitude and the altitude and identify the variance of the differences in the values. The results from this study show how to utilize meteorological data such as temperature and wind speed and transform it into useful information of drone flyability, even in specific areas that do not have nearby weather stations. We expect this research to become a practical index for the decision-making process of planning drone operations in previously unexplored areas.
Karena Liang - Time Series Modeling of Malaria Cases: Integrating climate and mosquito data through machine learning
Mentor: Lelys Bravo de Guenni
Entomological surveillance is very important in tropical remote areas where malaria is an endemic disease. However, data collection on mosquito abundance is costly and demanding especially in remote regions, due to logistic considerations and lack of consistency in the collection methods. The availability of long time series on the number of different mosquito species present in a particular location would improve understanding of the seasonal fluctuations and biting behavior of the different mosquito vector species transmitting the malaria parasites. In this work, we use machine learning approaches for missing data imputation in the estimated mean number of mosquitoes during a data collection period during 2010-2016. We compare the different imputation methods using a leave-one-out cross-validation approach. A correction method based on the significance of seasonal effects on mosquito populations was implemented to maintain balanced data in mosquito population counts between trap and human caught methods. We propose a generalized time series model for predicting the number of malaria cases as a function of climate drivers and mosquito abundance after data imputation. Model predictions aim to provide an analytic tool to estimate the incidence of the disease conditioned on several environmental factors.
Jeffrey Huang, Yunxi Zeng - Matching Methods for Observational Factorial Studies
Mentor: Ruoqi Yu
When implementing a randomized controlled trial is impractical, it is often necessary to use data from observational studies. In practice, such data suffers from issues of confounding among covariates, which introduces bias to the analysis. Matching is a popular method used to achieve covariate balance in observational data in order to make unbiased causal estimates. In observational factorial studies, researchers are interested in estimating both the main and interaction effects of multiple treatments simultaneously. A commonly used method is to balance the covariates on a continuous weight measure, which could be a generalized propensity score or some covariate basis functions. Matching methods are less discussed in the literature for designing observational factorial studies. We, therefore, propose a novel matching method using mixed integer programming and robust balancing constraints derived from the basis functions of covariates and the factors themselves as is necessary in multi-factor settings.
Mingqian Wang - Bridging Deep Learning and Symbolic Regression: A Hybrid Approach for Interpretability and Expressiveness
Mentor: Matthew Singh
In current times, popular machine learning models, especially deep learning, achieve excellent performance in predictive accuracy but often lack interpretability due to their black-box nature. On the other hand, classical symbolic regression, such as genetic programming, suffers from dealing with high-dimensional or noisy datasets. To solve these two issues, we aim to develop symbolic learning architecture to extract mathematical relationships from data in an interpretable and structured manner via deep learning techniques. In fact, some recent attempts to integrate deep learning techniques with symbolic regression have achieved meaningful progress. EQL networks integrate symbolic regression directly into a neural network by using simple mathematical operations (e.g. plus, log) as specialized activation functions. Inspired by the Kolmogorov-Arnold representation theorem, KANs replace traditional weight parameters in MLP with learnable univariate functions. Inspired by their work, we may design a new architecture by replacing KAN’s univariate splines with EQL-like symbolic functions while keeping KAN’s architecture. By doing this, we not only keep Kan's ability to learn hierarchical compositions of functions but also avoid overly complex spline representations. Hopefully, the integration of deep learning with symbolic regression would generate a more interpretable and expressive model to help scientists find the relationship hidden under data.
Shreyas Talluri - The creation of individualized brain models to control brain dynamics in response to tms
Mentor: Matthew Singh
This study aimed to create individualized brain models to predict brain dynamics in response to transcranial magnetic stimulation (TMS). The ability to do this can greatly benefit precision healthcare. Using a dataset of MRI and fMRI scans, I analyzed resting-state and task-based brain activity across subjects. Electroencephalography (EEG) and TMS stimulation were applied, and the resulting brain activity was recorded to capture dynamic responses. Data preprocessing included filtering faulty EEG channels, interpolating surrounding data, and rejecting artifacts such as blinking. By leveraging MATLAB scripts, I utilized a robust pipeline for data cleaning and interpolation, ensuring high-quality input for modeling. The individualized brain models were constructed by integrating multimodal neuroimaging data with computational techniques to estimate neural responses to TMS. These models provide insights into subject-specific brain dynamics and offer potential applications in optimizing TMS protocols for personalized therapeutic interventions.
Zheer Wang, Mohit Singh, Idrees Kudaimi - Era-adjusted baseball statistics: website, software, tech report, and interesting findings
Mentor: Daniel Eck
This presentation showcases a series of projects centered around Full House Modeling (FHM), a novel framework for era-adjusted baseball statistics. The student began by scraping and processing 2024 season data using FHM scripts, contributing metrics to both the public-facing website and the newly developed fullhouse R package, complete with documentation. Using these tools, the several analyses were conducted: (1) a comparison of baseball eras, showing that Dead Ball Era pitching dominance largely vanishes when properly adjusted; (2) an era-adjusted breakdown of a hypothetical “Crosstown Classic” between the 2005 White Sox and 2016 Cubs, which tilts in favor of the Cubs; and (3) a deep dive into Juan Soto’s career, revealing that his first seven seasons rival the greatest stretches in MLB history when viewed through the lens of era-adjustment. Importantly, his era-adjusted OBP is within a single point of Ted Williams. Soto’s recent $750 million contract, then, may not be as exorbitant as it seems, especially as the Mets' bold acquisition potentially shifts New York’s baseball center of gravity away from the Yankees.