Genome-wide tests are nowadays pervasive in medicine and some of them have become routine (e.g. array CGH) or will soon become (exome or whole genome). On the one hand these genome-wide tests provide an unprecedented opportunity for improving the quality and yield of diagnosis, treatment prescription and research. For example, in rare disease diagnostics, exome sequencing increases resolution of cases dramatically from ~5% five-fold to 25%. On the other hand they demand hospitals to seek up-to-date, certified and reliable (bio)informatics solutions to store, manage and analyse such a huge avalanche of data. This project arises from specific needs of the Centers for Medical Genetics of the VUB, ULB and UCL for 1) a reliable storage and easy access to clinical and phenotypic data as well as massive amounts of high- throughput genomic data, 2) extracting more information from genetic tests in an automated and validated manner, and 3) deploying the extracted knowledge in clinical routine decisions. These needs are made more urgent by the imminent availability of a joint VUB/ULB Next-Generation Sequencing platform, and by the presumably upcoming reimbursement of genomic tests by the Belgian National Health Service. Those technologies produce masses of genomic data waiting to be stored, organised, analysed and valorised. The project aims to answer these challenges in the following manners:
1) Design and creation of a multi-site phenomic and genomic data warehouse compliant with issues of interoperability, privacy, security, scalability and reliability: as of today, no integrated computer solution is available to store, analyse and collaborate around the large amounts of unstructured data which are already accumulating in ULB/VUB/UCL medical genetics centers.
2) Development of automated tools (including quality checking and mapping pipelines, pre- processing, dimensionality reduction and multivariate classification) for extracting relevant information from genetic data with focus on i) integrating relevant information related to copy number variation (CNV) and single-nucleotide variants coming from array CGH and exome sequencing respectively ii) shift from a monogenic analysis of genomic data to a multigenic approach by means of feature selection and dimensionality reduction approaches at first and by re-analysis of very large numbers of array CGH and exomes samples at a later stage looking for statistical association of variants at one gene or locus with a specific phenotype iii) analysis of the incidentalome, i.e., the known variants associated with known pathologies incidentally discovered by genome-wide profiling but for which the analysis was not initially prescribed. The possibility of screening for disease before the onset of symptoms for every patient in an automated fashion provides the opportunity of a shift towards preventive genetic medicine.
3) Use of the designed tools to extract new knowledge and transfer it to the medical setting with focus on three presumably oligogenic diseases (cardiac arrhythmia disorder Brugada Syndrome (BrS), epileptic encephalopathies and cleft lip and/or palate (CL/P)) and possible extension to other diseases (pulmonary hypertension, brain malformations, complex pediatric neurodevelopmental disorders, and mitochondrial diseases). The final goal is to provide reliable diagnostic predictor tools to the clinicians and to develop a framework for other presumably oligo/polygenic disorders.
The project will take advantage of the alliance of VUB and ULB scientists from the bioinformatics and medical side in the newly created (IB)2 institute (represented here by the coordinator of the project being also director of the institute) and the VUB-ULB Genomics Core (represented here by the VUB-UZ Brussel coordinator of the Genomics Core and the director of the ULB Center of Genetics) as well as of the expertise of the ULB/VUB InSilico Genomics spin-off in representing, storing, curating and managing huge masses of genomic data. The project will have a number of positive impacts on the Health sector in the Brussels Region:
· Improved quality of genomic tests analysis
· Improved reproducibility and regular updating of genomic tests analysis
as well as on the ICT sector by
· Solving interoperability issues related to the use of several platforms and tools
· Addressing the growing gap between sequencing throughput and computer capabilities in dealing with such big data
· Exploring the big data paradigm in the scalable storage and analysis of huge amount of genomic data
Designing in a scalable way state-of-the-art data mining and machine learning algorithms for dimensionality reduction, classification and prediction in bioinformatics.