Abstract
AbstractMotivationKnowledge of relationship and similarity among human diseases can be leveraged in many biomedical applications such as drug repositioning, biomarker discovery, differential diagnostics, and understanding of disease mechanisms. Recently developedcui2vecresource provides embeddings of approximately 109 thousand biomedical terms and allows computing novel measures of disease similarity, directly related to patterns in real-world data. We investigate whether disease embeddings fromcui2veccan be utilized to identify functional relations among diseases, to uncover their molecular mechanisms, and to generate hypotheses about novel gene-disease associations and potential drug targets.Methods and resultsWe focus on a subset of 3,568cui2vecterms corresponding to human diseases annotated in DisGeNET database. Disease-disease distance matrix is computed for this set of diseases based on their embedding vectors. Clustering of this matrix reveals a well-defined structure with good correspondence between disease clusters and the top MeSH disease categories. Using pulmonary embolism as an example we show how disease clustering is related to known mechanistic relations among diseases. Next, we combine disease embeddings with annotated gene-disease associations from DisGeNET to generate joint gene-disease co-embeddings. From these we identify molecular pathways most characteristic for each disease group and show that they are highly relevant to known disease physiology. Finally, we leverage disease similarity to generate and rank hypothesis for gene-disease associations and demonstrate that this method generates highly accurate results and can suggest relevant drug targets.ConclusionsWe show that combination of disease embeddings learned from massive amounts of biomedical records with curated data on gene-disease associations can reliably reveal groups of functionally related diseases and their molecular mechanisms and predict novel gene-disease associations. Importantly, our analysis does not require knowledge of associated genes for every disease to identify patterns in the embedding space, therefore it can be used to suggest mechanisms for conditions that have not been functionally understood. In this respect our analysis can be applied to identify potential markers and drug targets for poorly characterized orphan and rare diseases. It can also reveal unexpected novel connections among diseases and between diseases and molecular pathways.
Publisher
Cold Spring Harbor Laboratory
Cited by
1 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献