Comparing the use of all data or specific subsets for training machine learning models in hydrology: A case study of evapotranspiration prediction

Haiyang Shi, Geping Luo, Olaf Hellwich, Xiufeng He, Mingjuan Xie, Wenqiang Zhang, Friday U. Ochege, Qing Ling, Yu Zhang, Ruixiang Gao, Alishir Kurban, Philippe De Maeyer, Tim Van de Voorde

Onderzoeksoutput: Articlepeer review

7 Citaten (Scopus)
22 Downloads (Pure)

Samenvatting

Machine learning has been widely used in hydrological modeling. However, the question of whether to use all data for modeling or only a specific subset for modeling and its implications are rarely investigated explicitly. As a case study, combining evapotranspiration (ET) observations from 168 flux stations, meteorological and biophysical variables, we used Random Forests to separately construct an 'All data' model trained with all data and 6 'plant functional type (PFT) specific' models trained with specific PFT data (i.e., Forest, Grassland, Cropland, Shrubland‚ Savannah, Wetland). We found ET simulations between different specific PFTs are transferable. The 'All data' model captured better ET and had a higher R-squared at 94 of 168 sites, especially in Wetland, Shrubland, Cropland, and Grassland types. Compared to using the 'All data' model, the 'PFT specific' model can further improve the accuracy in high R-squared grassland sites by reducing the effect of confusion of other PFTs and constraining the variance of the training data. When shifting from the 'All data' model to the 'PFT specific' model, the increase in the degree of encapsulation of the training set into the prediction set leads to a decrease in the R-squared. Accuracy pre-evaluation may be necessary before applying models trained from either all data or subset data.

Originele taal-2English
Artikelnummer130399
Aantal pagina's14
TijdschriftJournal of Hydrology
Volume627
DOI's
StatusPublished - dec. 2023

Bibliografische nota

Publisher Copyright:
© 2023 Elsevier B.V.

Vingerafdruk

Duik in de onderzoeksthema's van 'Comparing the use of all data or specific subsets for training machine learning models in hydrology: A case study of evapotranspiration prediction'. Samen vormen ze een unieke vingerafdruk.

Citeer dit