Automated Annotation of Disease Subtypes

Author:

Ofer DanORCID,Linial MichalORCID

Abstract

AbstractBackgroundDistinguishing diseases into distinct subtypes is crucial for study, effective treatment, and the discovery of potential cures. The Open Targets Platform integrates biomedical, genetic, and biochemical datasets with the goal of empowering disease ontologies and gene targets.However, many disease annotations remain incomplete, necessitating laborious expert medical input. This is particularly painful for rare and orphan diseases, where resources are limited.ResultsWe present a machine learning approach to identifying diseases with potential subtypes, using the approximately 23,000 diseases documented in Open Targets. We derive and describe novel features for predicting diseases with subtypes, using direct evidence. Machine learning models were applied to analyze feature importance and evaluate predictive performance for discovering known subtypes. Our model achieves a high (89.1%) ROCAUC. We integrated pre-trained deep learning language models and showed their benefits. Furthermore, we identify 515 disease candidates predicted to possess previously unannotated subtypes.ConclusionsOur models can partition diseases into distinct subtypes. This methodology enables a robust, scalable approach for improving knowledge-based annotations and a comprehensive assessment of disease ontology tiers. Our candidates are attractive targets for further study and personalized medicine, potentially aiding in the unveiling of new therapeutic indications for sought-after targets.

Publisher

Cold Spring Harbor Laboratory

Reference57 articles.

1. Parkinson’s Disease Subtyping Using Clinical Features and Biomarkers: Literature Review and Preliminary Study of Subtype Clustering

2. Developing automated methods for disease subtyping in UK Biobank: an exemplar study on stroke

3. Subtyping: What It is and Its Role in Precision Medicine

4. World Health Organization , “ICD-10 : international statistical classification of diseases and related health problems : tenth revision,” World Health Organization, 2004. Accessed: Aug. 21, 2023. [Online]. Available: https://apps.who.int/iris/handle/10665/42980

5. Type 2 diabetes genetic loci informed by multi-trait associations point to disease mechanisms and subtypes: A soft clustering analysis

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3