DISNET: a framework for extracting phenotypic disease information from public sources

Author:

Lagunes-García Gerardo1,Rodríguez-González Alejandro12,Prieto-Santamaría Lucía1,García del Valle Eduardo P.1,Zanin Massimiliano1,Menasalvas-Ruiz Ernestina1

Affiliation:

1. Centro de Tecnología Biomédica, Universidad Politécnica de Madrid, Pozuelo de Alarcón, Madrid, Spain

2. Escuela Técnica Superior de Ingenieros Informáticos, Universidad Politécnica de Madrid, Boadilla del Monte, Madrid, Spain

Abstract

Background Within the global endeavour of improving population health, one major challenge is the identification and integration of medical knowledge spread through several information sources. The creation of a comprehensive dataset of diseases and their clinical manifestations based on information from public sources is an interesting approach that allows one not only to complement and merge medical knowledge but also to increase it and thereby to interconnect existing data and analyse and relate diseases to each other. In this paper, we present DISNET (http://disnet.ctb.upm.es/), a web-based system designed to periodically extract the knowledge from signs and symptoms retrieved from medical databases, and to enable the creation of customisable disease networks. Methods We here present the main features of the DISNET system. We describe how information on diseases and their phenotypic manifestations is extracted from Wikipedia and PubMed websites; specifically, texts from these sources are processed through a combination of text mining and natural language processing techniques. Results We further present the validation of our system on Wikipedia and PubMed texts, obtaining the relevant accuracy. The final output includes the creation of a comprehensive symptoms-disease dataset, shared (free access) through the system’s API. We finally describe, with some simple use cases, how a user can interact with it and extract information that could be used for subsequent analyses. Discussion DISNET allows retrieving knowledge about the signs, symptoms and diagnostic tests associated with a disease. It is not limited to a specific category (all the categories that the selected sources of information offer us) and clinical diagnosis terms. It further allows to track the evolution of those terms through time, being thus an opportunity to analyse and observe the progress of human knowledge on diseases. We further discussed the validation of the system, suggesting that it is good enough to be used to extract diseases and diagnostically-relevant terms. At the same time, the evaluation also revealed that improvements could be introduced to enhance the system’s reliability.

Funder

DISNET

Spanish Ministerio de Ciencia, Innovación y Universidades

Mexican Consejo Nacional de Ciencia y Tecnología

Programa de fomento de la investigación y la innovación

Publisher

PeerJ

Subject

General Agricultural and Biological Sciences,General Biochemistry, Genetics and Molecular Biology,General Medicine,General Neuroscience

Reference106 articles.

1. La ciencia de la Wikipedia;Aibar,2017

2. A text mining approach to identify the relationship between gait-Parkinson’s disease (PD) from PD based research articles;Aich,2017

3. Effective mapping of biomedical text to the UMLS Metathesaurus: the MetaMap program;Aronson;Proceedings. AMIA Symposium,2001

4. Evaluation of gastroenterology and hepatology articles on Wikipedia: are they suitable as learning resources for medical students?;Azer;European Journal of Gastroenterology & Hepatology,2014

5. Is wikipedia a reliable learning resource for medical students? Evaluating respiratory topics;Azer;Advances in Physiology Education,2015

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3