Abstract
AbstractGenome-wide assessment of genetic variation is becoming a routine in human genetics, but functional interpretation of non-coding variants both in common and rare diseases remains extremely challenging. Here we employed the massively parallel reporter assay ChIP-STARR-seq to functionally annotate activity of >140 thousand non-coding regulatory elements (NCREs) in human neural stem cells (NSCs) as a model for early brain development. Highly active NCREs show an increasing sequence constraint and harbourde novovariants in individuals affected by neurodevelopmental disorders. They are enriched for transcription factor (TF) motifs including YY1 and p53 family members and for the presence of primate-specific transposable elements, providing insights on gene regulatory mechanisms in NSCs. Examining episomal NCRE activity of the same sequences in human embryonic stem cells (ESCs) identified cell type differential activity and primed NCREs, accompanied by a rewiring of the epigenome landscape. Leveraging on the experimentally measured NCRE activity and nucleotide composition of the assessed sequences, we build BRAIN-MAGNET, a convolutional neural network that allows the prediction of NCRE activity based on DNA sequence composition, and which identifies functionally relevant nucleotides and TF motifs within each NCRE that are required for NCRE function. The application of BRAIN-MAGNET including its functional validation allows fine-mapping of GWAS loci identified for common neurological traits, and prioritization of possible disease causing rare non-coding variants in currently genetically unexplained individuals with neurogenetic disorders, including those from the Genomics England 100,000 Genomes project. We foresee that this NCRE atlas and BRAIN-MAGNET will help reducing missing heritability in human genetics, by limiting the search space for functional relevant non-coding genetic variation.
Publisher
Cold Spring Harbor Laboratory