Machine learning based unmixing of urban land cover from Sentinel-2 using synthetically mixed training data

Frank Canters, Frederik Priem, Sebastian Van der Linden, Akpona Okujeni

Research output: Unpublished contribution to conferenceUnpublished abstract


Monitoring sealed surface cover and understanding its dynamics is becoming increasingly important in defining suitable strategies for sustainable urban development. With better spectral, spatial and temporal characteristics than similar satellite sensors, Sentinel-2 holds great potential for monitoring of sealed surface growth. In this paper we explore the potential of machine learning based unmixing of Sentinel-2 data for mapping urban land cover at sub-pixel scale using the VIS (Vegetation-Impervious-Soil) conceptual model for describing urban land-cover composition.

Machine learning based regression approaches like support vector regression (SVR) have been shown to perform well in estimating sub-pixel land-cover fractions from remotely sensed data. Like all regression based approaches for unmixing, SVR requires training data able to represent all possible land-cover mixtures occurring within an image scene. Traditionally such mixed training data is produced by co-registering reference land-cover data of higher resolution with the image to be unmixed, yet this approach is data intensive and always entails error due to unavoidable geometric shifts between reference data and imagery. More recently synthetic mixing of endmember libraries has been proposed as an alternative way of producing training data for unmixing image scenes. This approach circumvents the problem of co-registration and makes it possible to generate mixed spectra for all possible combinations of land-cover class fractions. The main objective of this research is to explore the potential of Sentinel-2 data for sub-pixel land-cover fraction estimation in urban areas, using SVR and synthetic mixing of training data. Test area for this study is the Brussels metropolitan region. To be able to cope with the large spectral heterogeneity of urban materials in generating mixed spectra for model training, use is made of an urban endmember library obtained from high-resolution hyperspectral data (APEX-HyMap), resampled to the spectral/spatial resolution of Sentinel-2.

Because synthetic mixing tends to produce huge amounts of mixed spectra, even with a modest endmember library, in this study the amount of mixed spectra is reduced by pruning the endmember library through spectral distance thresholding. This entails the removal of spectrally similar and redundant endmembers based on user-defined normalized RMSE thresholds between endmembers. Ternary (three spectra) and binary (two spectra) linear mixtures with mixing steps of 20% are defined to produce mixed libraries sufficiently covering the urban feature space. Then, the mapping is carried out using ensemble SVR trained with stratified samples of mixed spectra. For validation, mapping results are aggregated to the level of building blocks, to reduce the impact of geometric error caused by co-registration of image and reference data.

In this study, we also evaluate the potential of combining endmember libraries collected on different sites, from different sensor data. The idea of combining endmember libraries fits within the concept of developing generic, universally applicable spectral libraries, reducing the need to define dedicated training data for each image to be unmixed. To test the potential of combining spectral libraries, a publicly available library extracted from HyMap data covering Berlin is used to expand the APEX library developed for Brussels. Mapping results based on the combined APEX-HyMap pruned library yield block level fractional RMSE’s for sealed surfaces, bare soil and vegetation of 0.11, 0.11 and 0.08 respectively. Overall mapping results improve considerably after pruning, and confusion between sealed surfaces and bare soil is strongly reduced, compared to the use of an unpruned spectral library. Results obtained with Sentinel-2, using the unmixing approach proposed in this study, seem promising. Current work focuses on comparing the performance of synthetic mixing with the traditional approach of model training, using mixed signatures extracted from the image to be unmixed.
Original languageEnglish
Number of pages2
Publication statusPublished - 2017
EventInternational Cartographic Conference 2017, Washington D.C - Washington DC, Washington DC, United States
Duration: 2 Jul 2017 → …


ConferenceInternational Cartographic Conference 2017, Washington D.C
Abbreviated titleICC
Country/TerritoryUnited States
CityWashington DC
Period2/07/17 → …
Internet address


  • urban land cover
  • unmixing
  • support vector regression
  • Sentinel-2
  • synthetic mixing
  • endmember library
  • library pruning


Dive into the research topics of 'Machine learning based unmixing of urban land cover from Sentinel-2 using synthetically mixed training data'. Together they form a unique fingerprint.

Cite this