Non-coding genetic variants in human disease

Project: DFG ProjectsDFG Scholarships: Research Fellowships

Project Details

Description

Medical genetics is being transformed by next-generation sequencing (NGS) technologies that enable the investigation of the entire genome. So far, the interpretation of disease-related variation has focused on protein coding DNA. This focus on just 1.5% of the genome, i.e. the exome, has been exceptionally successful. However in over 40% of Mendelian phenotypes, no disease-causing coding variants can be found. I propose that this could be due to the fact that the non-coding sequence has been largely ignored despite the fact that most nucleotides and deleterious variants are non-coding. Recent studies including my own work suggest that non-coding mutations contribute to a substantial number of human disease phenotypes and should thus be taken into account for the medical interpretation of genetic variants. My goal in this research project is to achieve a better understanding of genetic variants found in non-coding cis-regulatory elements and their role in human disease.There are several challenges that currently hamper the medical interpretation of the non-coding DNA. First, the regulatory code of the non-coding genome is currently poorly understood. Second, there is dearth of gold standard datasets for non-coding variants. Third, the sheer number of non-coding variants in each individual and generation makes classical functional work-up strategies impossible. Fourth, the topologically associating domain (TAD) architecture of the genome is an important aspect of gene regulation. Structural variations have the potential to alter TAD boundaries: This allows enhancers from neighbouring domains to ectopically activate genes causing mis-expression and disease. This enhancer adoption disease mechanism has largely been ignored by human geneticists so far. To address these challenges, I will apply three experimental approaches: Aim 1: I will use massively parallel reporter assays (MPRA) for random saturation mutagenesis of 12 selected disease associated cis-regulatory elements to investigate the effects of tens-of-thousands of non-coding regulatory mutations in cell lines. Thereby I aim to create a large standardized dataset of functionally validated non-coding variants that will help to develop interpretive schemes for non-coding variants. Aim 2: I aim to develop a next generation functional test to evaluate the functional outcome of all de novo non-coding variants from two whole genome sequencing studies of patients with severe intellectual disability and congenital limb malformation. I plan to synthesize all de novo variants and the corresponding wild type sequences and test them in a MPRA in cells. Aim 3: I aim to evaluate enhancer adoption as a human disease mechanism by multiplexed deletions of topologically associating domain boundaries by CRISPR/Cas9 genome editing in cells. These findings will directly impact future WGS studies and help to identify non-coding genetic variants in cancer and congenital disease.

Key findings

NGS technologies enable the simultaneous investigation of the entire genome. However, 40% of patients remain without molecular diagnosis despite that fact that on average ~80 de novo SNVs per patients are identified. The sheer number of these variants, overwhelmingly non-coding, make classical functional workup strategies impractical. In the first part of the project, we performed whole genome sequencing of 50 trios affected with congenital limb malformations, and followed this with functional characterization of all observed non-coding de novo SNVs via massively parallel reporter assays (MPRAs). All patients included were array CGH and exome negative. In total we identified 3,396 de novo mutations in the 50 patients. 5 de novo mutations were located in predicted enhancer regions based on epigenetics marks. For two of these predicted enhancers we could show positive in vivo enhancer activity in transgenic mouse reporter assays. Next, we used microarray-based DNA synthesis to create 230 bp oligonucleotides containing all 3,396 de novo non-coding variants and the corresponding wild-type sequences and perform lentivirus-based MPRAs in human chondrocyte cells and primary mouse limb bud cells. We experimentally measured the cis-regulatory activity of 3,396 de novo non-coding mutations in a single, quantitative experiment. We identified 48 variants that showed significant differential expression of the reporter gene. The positive candidates showed up to 5-fold enrichment for ENCODE TF binding motifs indicating that the mutations are likely to change TF binding and thereby contribute to disease. Our study provides a conceptual framework for the experimental assessment of the large number of de novo non-coding mutations from WGS studies. In the second part of the project, we wanted to study non-coding mutations and structural variants during embryogenesis. As a first step towards understanding pleiotropic developmental disorders at the organismal scale, we created a single cell atlas of the embryonic development of wild-type mice. We studied the transcriptional dynamics of mouse development during organogenesis at single cell resolution. With an improved single cell combinatorial indexing-based protocol ('sci-RNA-seq3'), we profiled over 2 million single cells derived from 61 mouse embryos staged between 9.5 and 13.5 days of gestation (E9.5 to E13.5; 10-15 replicates per time-point) in one, multiplex experiment. We identify major transcriptional lineages as well as hundreds of expanding, contracting and transient cell types, many of which are only detected because of the depth of cellular coverage obtained here. We explore the dynamics of gene expression within cell types over time, and infer pseudo-temporal trajectories of mouse limb and muscle development, including examples of distinct trajectories to the same endpoint. These data comprise a foundational resource for mammalian developmental biology. These data are of enormous interest for human genetics as genotype-phenotype correlations are extremely difficult to understand since the severity of genetic disorders can differ even in individuals with mutations in the same gene.
Statusfinished
Effective start/end date01.01.1631.12.18

UN Sustainable Development Goals

In 2015, UN member states agreed to 17 global Sustainable Development Goals (SDGs) to end poverty, protect the planet and ensure prosperity for all. This project contributes towards the following SDG(s):

  • SDG 3 - Good Health and Well-being

Research Areas and Centers

  • Research Area: Medical Genetics

DFG Research Classification Scheme

  • 205-03 Human Genetics

Fingerprint

Explore the research topics touched on by this project. These labels are generated based on the underlying awards/grants. Together they form a unique fingerprint.
  • The single-cell transcriptional landscape of mammalian organogenesis

    Cao, J., Spielmann, M., Qiu, X., Huang, X., Ibrahim, D. M., Hill, A. J., Zhang, F., Mundlos, S., Christiansen, L., Steemers, F. J., Trapnell, C. & Shendure, J., 28.02.2019, In: Nature. 566, 7745, p. 496-502 7 p.

    Research output: Journal ArticlesJournal articlesResearchpeer-review

    228 Citations (Scopus)