Beyond Monogenic Diseases
: A first collection and analysis of digenic diseases

Student thesis: Doctoral Thesis

Abstract

In the next generation sequencing era many bioinformatics tools have been devel- oped for assisting scientists in their studies on the molecular basis of genetic dis- eases, often with the aim of identifying the pathogenic variants. As a consequence, in the last decades more than one hundred new disease-gene associations have been discovered. Nevertheless, the genetic basis of many genetic diseases yet remains undisclosed. It has been shown that many diseases considered as monogenic with an imperfect genotype-phenotype correlation or incomplete penetrance are, on the con- trary, caused or modulated by more than one mutated gene, meaning that they are in fact oligogenic. Current bioinformatics methods used for identifying pathogenic variants are trained and fine-tuned for identifying a single variant responsible of a disease. This monogenic-oriented approach cannot be used to explore the impact of combinations of variants in different genes on the complexity and genetic hetero- geneity of rare diseases. Digenic diseases are the simplest form of oligogenic disease and thus they can provide a conceptual bridge between monogenic and the poorly understood polygenic diseases.
The ambition of this thesis is to collect and analyse digenic data, introducing this topic in the bioinformatics field where digenic diseases are still an unexplored branch. This can be divided into steps: the first consists in the creation of a central repository containing detailed information on digenic diseases; the second is an analysis of their peculiarities, using machine learning methods for studying subclasses of digenic effects.
In the first step we developed DIDA (DIgenic diseases DAtabase), a novel database that provides for the first time a curated collection of genes and associated variants involved in digenic diseases. Detailed information related to the digenic mechanism have been manually mined from the medical literature. All instances in DIDA were also assigned to two sub classes of digenic effects, annotated as true digenic (both genes are required for developing the disease) and composite classes (one gene is sufficient to produce the disease phenotype, the second one alters it or age of onset).
In the second step, we hypothesized that the digenic effect may be related to some biological properties characterizing digenic combinations. Using machine learning
methods, we show that a set of variant, gene and higher-level features can differen- tiate between the true digenic and composite classes with high accuracy. Moreover, we show that a digenic effect decision profile, extracted from the predictive model, motivates why an instance is assigned to either of the two classes.
Together, our results show that digenic disease data generates novel insights, providing a glimpse into the oligogenic realm.
Date of Award2018
Original languageEnglish
Awarding Institution
  • Université libre de Bruxelles
  • Vrije Universiteit Brussel
SupervisorTom Lenaerts (Promotor) & Sonia Van Dooren (Promotor)

Cite this

'