Skip to main navigation Skip to search Skip to main content

Graph neural networks for improved retention time predictions and molecular identification

Alexander Kensert, Robbin Bouwmeester, Kyriakos Efthymiadis, Peter Van Broeck, Gert Desmet, Deirdre Cabooter

Research output: Chapter in Book/Report/Conference proceedingMeeting abstract (Book)

167 Downloads (Pure)

Abstract

Liquid chromatography (LC) is an important analytical tool used in all stages of drug discovery and development, and is for example used to identify and quantify degradation products and impurities, and to determine the drug candidate in bioanalytical samples during clinical trials. To improve the analysis of these compounds, machine learning (ML) models can be developed to predict their retention times (RT).1 In this study, a new generation of ML models, namely
graph neural networks (GNNs),2 are developed to improve the accuracy of RT predictions to better filter out false positives in the identification of molecules. Classical ML models to predict RTs are based on so-called descriptors of
molecules (e.g., Log P and total polar surface area (TPSA)). These descriptors are fixed numerical representations of molecules which are directly operated on by the ML algorithm. Although these descriptors have proven highly
predictive, they are not optimized to predict RTs. In contrast, a GNN optimizes the numerical representation of molecules, based on their atoms and the bonds between these atoms, to predict RTs. Each molecule is treated as a
graphs G = (V, E), where V is a set of atoms (vertices) and E a set of bonds (edges).
In this work, a number of interesting variants of GNNs, including graph convolutional networks (GCNs), messagepassing neural networks (MPNNs) and graph attention networks (GATs), are used for the purpose of improving RT
predictions and molecular identification for different LC modes. These GNNs all abstract out information about the molecule based on complex aggregations of information from local structures within the molecular graph.
Preliminary results indicate that the GNNs perform overall significantly better than classical ML models (such as random forests and support vector machines) for RT predictions in LC (about 4 to 25% lower absolute error for all LC modes
investigated). Utilizing both the graph structure of a molecule and the “low-level” molecular structures such as atoms and bonds, is demonstrated to better abstract out information from the molecule for the downstream task (namely, RT
predictions). Importantly, improved models for predicting RTs could lead to significant improvements in identifying molecules in practice, which could significantly reduce time, effort and costs in e.g., drug discovery and development.

[1] Bouwmeester, R.; Martens, L.; Degroeve, S. Comprehensive and Empirical Evaluation of Machine Learning Algorithms for Small Molecule LC Retention Time Prediction. Analytical Chemistry. 2019, 91 (5), 3694–3703.
[2] Wu, Z.; Pan, S.; Chen, F.; Long, G.; Zhang, C.; Yu, P. S. A Comprehensive Survey on Graph Neural Networks. IEEE Transactions on Neural Networks and Learning Systems. 2021, 32 (1), 4–24.
Original languageEnglish
Title of host publication17th International Symposium on Hyphenated Techniques in Chromatography and Separation Technology (HTC-17)
Subtitle of host publicationBook of Abstracts
Place of PublicationLeuven, Belgium
PublisherKU Leuven
Pages88-88
Number of pages1
Publication statusPublished - May 2022
Event17th International Symposium on Hyphenated Techniques in Chromatography and Separation Technology - Ghent, Belgium
Duration: 18 May 202220 May 2022
Conference number: 17
https://htc-17.com/

Conference

Conference17th International Symposium on Hyphenated Techniques in Chromatography and Separation Technology
Abbreviated titleHTC-17
Country/TerritoryBelgium
CityGhent
Period18/05/2220/05/22
Internet address

Keywords

  • deep learning
  • graph neural networks
  • retention time predictions
  • molecular identification

Fingerprint

Dive into the research topics of 'Graph neural networks for improved retention time predictions and molecular identification'. Together they form a unique fingerprint.

Cite this