Towards multivariant pathogenicity predictions x
: Towards multivariant pathogenicity predictions - Using machine learning to directly predict and explore disease-causing oligogenic variant combinations

Student thesis: Doctoral Thesis


The emergence of statistical and predictive methods able to analyse genomic data has revolutionised the field of medical genetics, allowing the identification of disease-causing gene variants for several human genetic diseases. Although these approaches have greatly improved our understanding of Mendelian «one gene – one phenotype» genetic models, studying diseases related to more intricate models that involve causative variants in several genes (i.e. oligogenic diseases) still remains a challenge, either due to the lack of sufficient disease-specific cohorts to study or the emergence of cases with incomplete penetrance and phenotypic variability among patients. This situation makes it difficult to not only understand the genetic mechanisms of the disease, but to also offer proper counseling and support to the patient. Until recently, no specialized predictive methods existed to directly predict causative variant combinations for oligogenic diseases. However, with the advent of data on variant combinations in gene pairs (i.e. bilocus variant combinations) leading to disease, collected at the Digenic Diseases Database (DIDA), we hypothesized that the transition from single to variant combination pathogenicity predictors is now possible.
To investigate this hypothesis, we organised our research on two main routes. At first, we developed an interpretable variant combination pathogenicity predictor, called VarCoPP, for gene pairs. For this goal, we trained multiple Random Forest algorithms on pathogenic bilocus variant combinations from DIDA against neutral data from the 1000 Genomes Project and investigated the contribution of the incorporated variant, gene and gene pair features to the prediction outcome. In the second part, we explored the usefulness of different gene pair burden scores based on this novel predictive method, in discovering oligogenic signatures in neurodevelopmental diseases (NDDs), which involve a spectrum of monogenic to polygenic cases. We performed a preliminary analysis on the Deciphering Developmental Diseases (DDD) project containing exome data of 4195 families and assessed the capability of our scores in supporting already diagnosed monogenic cases, discovering significant pairs compared to control cases and linking patients in communities based on the genetic burden they share, using the Leiden community detection algorithm.
The performance of VarCoPP shows that it is possible to predict disease-causing bilocus variant combinations with good accuracy both during cross-validation and when testing on new cases. We also show its relevance for disease-related gene panels, and enhanced its clinical applicability by defining confidence zones that guarantee with 95% or 99% probability that a prediction is indeed a true positive, guiding clinical researchers towards the most relevant results. This method and additional biological annotations are incorporated in an online platform called ORVAL that allows the prediction and exploration of candidate disease-causing oligogenic variant combinations with predicted gene networks, based on patient variant data. Our preliminary analysis on the DDD cohort shows that normalisation on the gene disease susceptibility leads to the detection of more diagnosed cases and significant gene pairs, but less known genes involved in NDDs compared to non-normalised burden scores, as well as less connected communities. We also show that, regardless of the burden score, the cohort is divided in a large number of patient communities, confirming the significant genetic and phenotypic heterogeneity that is present among the patients. Our predictive method is also able to bring to the surface genes not officially known to be involved in NDDs, but nevertheless, with a biological relevance, thus potentially paving the way for the discovery of novel oligogenic causes.
Date of Award2020
Original languageEnglish
Awarding Institution
  • Université libre de Bruxelles
  • Vrije Universiteit Brussel
SupervisorTom Lenaerts (Promotor) & Ann Nowe (Promotor)

Cite this