TBA: Incorporating domain knowledge in machine learning based parameter estimation for watershed models
Xingyuan Chen1, Peishi Jiang1, Pin Shuai1, and Alex Sun2
1Pacific Northwest National Laboratory, Richland, Washington, USA
2University of Texas, Austin, Texas, USA
Process-based watershed models that couple subsurface, land-surface, and energy budget processes are highly desired at the watershed and basin scales to answer a wide range of science questions. Many parameters in those models are difficult and expensive to measure directly at the spatial extent and resolution required by fully distributed watershed models. The wide availability of stream surface flow data and remote-sensing data products, compared to groundwater monitoring data, provides new data sources for inverse modeling to infer parameters including the soil and geologic properties in integrated surface and subsurface hydrologic models. We have successfully applied deep neural networks (DNNs) to develop inverse mapping that captures complex, highly nonlinear relationships between model parameters and observed system responses across a number of watersheds within the United States. Given the increasing computational cost of fully distributed, mechanistic watershed models, we found that domain knowledge gained from multi-step sensitivity analyses can effectively reduce the dimensionality and hence size of ensemble forward simulation required for training accurate DNNs. Instead of mapping all observations to all model parameters to be estimated, we strategized the estimation by systematically understanding the spatial and temporal information content in multiple types of data for each parameter. Using multi-year streamflow observations at the watershed outlets, we found that including a dry-year streamflow response is more important than the wetter years for estimating subsurface properties. The evapotranspiration (ET) data products from remote sensing may not add additional information to watershed parameter estimation when streamflow observations are available. However, they could still be valuable for ungaged watersheds. Our studies highlight the importance of developing and incorporating domain knowledge when applying machine learning methods to assist watershed modeling, shedding new light on broader applications of machine learning methods in various Earth science domains.
Xingyuan is a senior Earth cientist in the Atmospheric Measurement and Data sciences Group of the Pacific Northwest National Laboratory. Her main research interests include watershed hydrologic and biogeochemical modeling, stochastic inverse modeling, data assimilation, uncertainty quantification, and data-model integration using machine learning methods. Xingyuan got her PhD in Civil and Environmental Engineering from University of California at Berkeley. Prior to that, she got her Master’s degree in Civil Engineering from Hong Kong University of Science and Techology, and her Bachelor’s degree in Hydraulic Engineering from Tsinghua University in Beijing, China.