PhD Final Defense – Xiaokai Yang

Jul 6, 2026   9:00 am  
Sponsor
Department of Civil and Environmental Engineering
Originating Calendar
CEE Seminars and Conferences

Atmospheric Chemistry Surrogate Modeling with Machine Learning

Advisor: Christopher W. Tessum

Location: Online via Zoom

Abstract

Atmospheric chemistry is among the most computationally expensive components of air-quality and climate models, because it requires solving high-dimensional, stiff systems of ordinary differential equations. This dissertation develops a progression of physics-informed, interpretable machine-learned surrogate models that accelerate gas-phase chemistry while avoiding the error accumulation that limits purely data-driven surrogates over long integrations.

First, we use Sparse Identification of Nonlinear Dynamics (SINDy), combined with a singular-value-decomposition latent space, to build a surrogate for a simplified photochemical mechanism; the model is an order of magnitude faster than the reference mechanism and, unlike previous machine-learned surrogates, exhibits no error accumulation over nine-day simulations. Second, to scale to the near-explicit Master Chemical Mechanism, we introduce Sparse Identification of Mass Action Dynamics (SIMADy), which constrains the surrogate to a reaction network governed by the law of mass action and therefore guarantees bounded solutions; using six latent chemical species, the surrogate reproduces ozone to within 25% of the reference root-mean-square concentration while running thousands of times faster on a CPU and roughly a million times faster on a GPU. Third, we couple SIMADy to the full GEOS-Chem gas-phase mechanism within a three-dimensional chemical transport model over the contiguous United States, built in the EarthSciML modeling framework we developed, reducing the 271-species mechanism to thirty state variables by a hybrid lumping and refining the reaction network with non-negative machine-learned rate and stoichiometric corrections; across four ten-day test windows spanning all four meteorological seasons, the surrogate reproduces the reference surface-ozone field with a coefficient of determination between 0.969 and 0.999 and a domain-mean bias within ±1.11 ppb, accelerates the gas-phase chemistry solver by 33.8×, keeps the solution non-negative by construction, and tracks the reference ozone response to emission perturbations outside its training distribution.

Together, these results show that parsimonious, mass-action-constrained machine learning can deliver fast, accurate, and interpretable surrogates of gas-phase chemistry suitable for large-scale simulation.

link for robots only