Identification of disease mechanisms and novel disease genes using clinical concept embeddings learned from massive amounts of biomedical data

Author:

Bugrim Andrej

Abstract

AbstractMotivationKnowledge of relationship and similarity among human diseases can be leveraged in many biomedical applications such as drug repositioning, biomarker discovery, differential diagnostics, and understanding of disease mechanisms. Recently developedcui2vecresource provides embeddings of approximately 109 thousand biomedical terms and allows computing novel measures of disease similarity, directly related to patterns in real-world data. We investigate whether disease embeddings fromcui2veccan be utilized to identify functional relations among diseases, to uncover their molecular mechanisms, and to generate hypotheses about novel gene-disease associations and potential drug targets.Methods and resultsWe focus on a subset of 3,568cui2vecterms corresponding to human diseases annotated in DisGeNET database. Disease-disease distance matrix is computed for this set of diseases based on their embedding vectors. Clustering of this matrix reveals a well-defined structure with good correspondence between disease clusters and the top MeSH disease categories. Using pulmonary embolism as an example we show how disease clustering is related to known mechanistic relations among diseases. Next, we combine disease embeddings with annotated gene-disease associations from DisGeNET to generate joint gene-disease co-embeddings. From these we identify molecular pathways most characteristic for each disease group and show that they are highly relevant to known disease physiology. Finally, we leverage disease similarity to generate and rank hypothesis for gene-disease associations and demonstrate that this method generates highly accurate results and can suggest relevant drug targets.ConclusionsWe show that combination of disease embeddings learned from massive amounts of biomedical records with curated data on gene-disease associations can reliably reveal groups of functionally related diseases and their molecular mechanisms and predict novel gene-disease associations. Importantly, our analysis does not require knowledge of associated genes for every disease to identify patterns in the embedding space, therefore it can be used to suggest mechanisms for conditions that have not been functionally understood. In this respect our analysis can be applied to identify potential markers and drug targets for poorly characterized orphan and rare diseases. It can also reveal unexpected novel connections among diseases and between diseases and molecular pathways.

Publisher

Cold Spring Harbor Laboratory

Cited by 1 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3