Can the vector space model be used to identify biological entity activities?

Author:

Maciel Wesley D,Faria-Campos Alessandra C,Gonçalves Marcos A,Campos Sérgio VA

Abstract

Abstract Background Biological systems are commonly described as networks of entity interactions. Some interactions are already known and integrate the current knowledge in life sciences. Others remain unknown for long periods of time and are frequently discovered by chance. In this work we present a model to predict these unknown interactions from a textual collection using the vector space model (VSM), a well known and established information retrieval model. We have extended the VSM ability to retrieve information using a transitive closure approach. Our objective is to use the VSM to identify the known interactions from the literature and construct a network. Based on interactions established in the network our model applies the transitive closure in order to predict and rank new interactions. Results We have tested and validated our model using a collection of patent claims issued from 1976 to 2005. From 266,528 possible interactions in our network, the model identified 1,027 known interactions and predicted 3,195 new interactions. Iterating the model according to patent issue dates, interactions found in a given past year were often confirmed by patent claims not in the collection and issued in more recent years. Most confirmation patent claims were found at the top 100 new interactions obtained from each subnetwork. We have also found papers on the Web which confirm new inferred interactions. For instance, the best new interaction inferred by our model relates the interaction between the adrenaline neurotransmitter and the androgen receptor gene. We have found a paper that reports the partial dependence of the antiapoptotic effect of adrenaline on androgen receptor. Conclusions The VSM extended with a transitive closure approach provides a good way to identify biological interactions from textual collections. Specifically for the context of literature-based discovery, the extended VSM contributes to identify and rank relevant new interactions even if these interactions occcur in only a few documents in the collection. Consequently, we have developed an efficient method for extracting and restricting the best potential results to consider as new advances in life sciences, even when indications of these results are not easily observed from a mass of documents.

Publisher

Springer Science and Business Media LLC

Subject

Genetics,Biotechnology

Reference62 articles.

1. Silverman RB: Drug Discovery, Design, and Development. The Organic Chemistry of Drug Design and Drug Action. 2004, Elsevier Academic Press, 7-120. second

2. Salton G, McGill MJ: Introduction to Modern Information Retrieval. 1986, New York: McGraw-Hill Book Co

3. Baeza-Yates RA, Ribeiro-Neto BA: Modern Information Retrieval. 1999, New York: ACM Press / Addison-Wesley

4. Witten IH, Moffat A, Bell TC: Managing Gigabytes: Compressing and Indexing Documents and Images. 1999, Morgan Kaufmann Publishing, second

5. Hristovski D, Friedman C, Rindflesch TC, Peterlin B: Exploiting Semantic Relations for Literature-Based Discovery. American Medical Informatics Association Symposium Proceedings. 2006, Washington DC, United States of America, 349-353.

Cited by 3 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A Systematic Review on Literature-based Discovery;ACM Computing Surveys;2020-11-30

2. A systematic review on literature-based discovery workflow;PeerJ Computer Science;2019-11-18

3. Text mining patents for biomedical knowledge;Drug Discovery Today;2016-06

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3