Web scraping as a tool to decrease data scarcity on geo-hydrological hazards, in tropical Africa

Bram Valkenborg, Olivier Dewitte, Benoît Smets

Onderzoeksoutput: Poster

31 Downloads (Pure)

Samenvatting

Geo-hydrological hazards (GH) and associated disasters remain poorly documented in some parts of the world, creating a systematic bias in existing disaster databases. Furthermore, demographic pressure and urban sprawl increase exposure, vulnerability, and, consequently, disaster risks. Although poorly documented, GH and disaster events in data scarcity contexts are sometimes mentioned and described in posts and articles found on social media and newspaper websites. These sources of information represent an opportunity to improve our knowledge and complement databases. The present research aims to develop a web scraping tool to extract information on GH from social media and digital news articles, with a specific focus on tropical Africa. This semi-automated tool retrieves information from articles and posts using a 3-step approach: (1) extracting relevant articles through keyword-based and context-based filtering, (2) extracting key information from the relevant texts (i.e., location, timing, and impact), and (3) processing the extracted information to build-up a GH disaster database. For this, Natural Language Processing based on Large Language Models (LLMs), such as BERT, and GPT, is exploited. In practice, the approach has limitations (e.g., keyword used, misinterpreted dates and localities, complex or nuanced answers, missing information), but remains promising, especially in a GH data scarce region. Therefore, web scraping should be considered as a complementary approach to data collection on GH. Future work will combine this technique with, for example, remote sensing, and citizen science data collection techniques, to further reduce systematic bias in disaster documentation.
Originele taal-2English
StatusPublished - 15 mrt 2024
EvenementBelgian Geography Days 2024: Geographers in transition - Université de Namur, Namur, Belgium
Duur: 15 mrt 202415 mrt 2024
https://bgeoday.unamur.be/
https://bgeoday.unamur.be

Conference

ConferenceBelgian Geography Days 2024
Land/RegioBelgium
StadNamur
Periode15/03/2415/03/24
Internet adres

Vingerafdruk

Duik in de onderzoeksthema's van 'Web scraping as a tool to decrease data scarcity on geo-hydrological hazards, in tropical Africa'. Samen vormen ze een unieke vingerafdruk.

Citeer dit