Abstract
AbstractSpeech and language disorders are known to have a substantial genetic contribution. Although frequently examined as components of other conditions, research on the genetic basis of linguistic differences as separate phenotypic subgroups has been limited so far.Here, we performed an in-depth characterization of speech and language disorders in 52,143 individuals, reconstructing clinical histories using a large-scale data mining approach of the Electronic Medical Records (EMR) from an entire large paediatric healthcare network.The reported frequency of these disorders was the highest between 2 and 5 years old and spanned a spectrum of twenty-six broad speech and language diagnoses. We used Natural Language Processing to assess to which degree clinical diagnosis in full-text notes were reflected in ICD-10 diagnosis codes. We found that aphasia and speech apraxia could be easily retrieved through ICD-10 diagnosis codes, while stuttering as a speech phenotype was only coded in 12% of individuals through appropriate ICD-10 codes. We found significant comorbidity of speech and language disorders in neurodevelopmental conditions (30.31%) and to a lesser degree with epilepsies (6.07%) and movement disorders (2.05%). The most common genetic disorders retrievable in our EMR analysis wereSTXBP1(n=21),PTEN(n=20), andCACNA1A(n=18). When assessing associations of genetic diagnoses with specific linguistic phenotypes, we observed associations ofSTXBP1and aphasia (P=8.57 x 10-7, CI=18.62-130.39) andMYO7Awith speech and language development delay due to hearing loss (P=1.24 x 10-5, CI=17.46-Inf). Finally, in a sub-cohort of 726 individuals with whole exome sequencing data, we identified an enrichment of rare variants in synaptic protein and neuronal receptor pathways and associations ofUQCRC1with expressive aphasia andWASHC4with abnormality of speech or vocalization.In summary, our study outlines the landscape of paediatric speech and language disorders, confirming the phenotypic complexity of linguistic traits and novel genotype-phenotype associations. Subgroups of paediatric speech and language disorders differ significantly with respect to the composition of monogenic aetiologies.
Publisher
Cold Spring Harbor Laboratory