Modeling and mining term association for improving biomedical information retrieval performance

Author:

Hu Qinmin,Huang Jimmy Xiangji,Hu Xiaohua

Abstract

Abstract Background The growth of the biomedical information requires most information retrieval systems to provide short and specific answers in response to complex user queries. Semantic information in the form of free text that is structured in a way makes it straightforward for humans to read but more difficult for computers to interpret automatically and search efficiently. One of the reasons is that most traditional information retrieval models assume terms are conditionally independent given a document/passage. Therefore, we are motivated to consider term associations within different contexts to help the models understand semantic information and use it for improving biomedical information retrieval performance. Results We propose a term association approach to discover term associations among the keywords from a query. The experiments are conducted on the TREC 2004-2007 Genomics data sets and the TREC 2004 HARD data set. The proposed approach is promising and achieves superiority over the baselines and the GSP results. The parameter settings and different indices are investigated that the sentence-based index produces the best results in terms of the document-level, the word-based index for the best results in terms of the passage-level and the paragraph-based index for the best results in terms of the passage2-level. Furthermore, the best term association results always come from the best baseline. The tuning number k in the proposed recursive re-ranking algorithm is discussed and locally optimized to be 10. Conclusions First, modelling term association for improving biomedical information retrieval using factor analysis, is one of the major contributions in our work. Second, the experiments confirm that term association considering co-occurrence and dependency among the keywords can produce better results than the baselines treating the keywords independently. Third, the baselines are re-ranked according to the importance and reliance of latent factors behind term associations. These latent factors are decided by the proposed model and their term appearances in the first round retrieved passages.

Publisher

Springer Science and Business Media LLC

Subject

Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology

Reference32 articles.

1. Salton G, Fox EA, Wu H: Extended Boolean information retrieval. Commun ACM. 1983, 26 (11): 1022-1036. 10.1145/182.358466.

2. Hersh W, Cohen A, Yang J: TREC 2005 Genomics Track overview. Proceedings of 14th Text REtrieval Conference, NIST Special Publication. 2005

3. Robertson SE, Walker S: Some simple effective approximations to the 2-Poisson model for probabilistic weighted retrieval. Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, 3-6 July 1994, Dublin, Ireland. 1994, ACM/Springer, 232-241.

4. Subbaraoand C, Subbarao N, Chandu S: Characterisation of groundwater contamination using factor analysis. Environmental Geology. 1995, 28: 175-180.

5. Reyment R, Joreskog G: Applied Factor Analysis in the Natural Sciences. 1996, 2

Cited by 5 articles. 订阅此论文施引文献 订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献

1. A topic‐based term frequency normalization framework to enhance probabilistic information retrieval;Computational Intelligence;2020-05

2. Ranking Documents Through Stochastic Sampling on Bayesian Network-based Models;Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval;2016-07-07

3. Pharmacophore and Docking Based Virtual Screening of Validated Mycobacterium tuberculosis Targets;Combinatorial Chemistry & High Throughput Screening;2015-09-03

4. Mapping Collaboration Networks in the World of Autism Research;Autism Research;2014-05-21

5. Modeling Term Associations for Probabilistic Information Retrieval;ACM Transactions on Information Systems;2014-04

同舟云学术

1.学者识别学者识别

2.学术分析学术分析

3.人才评估人才评估

"同舟云学术"是以全球学者为主线,采集、加工和组织学术论文而形成的新型学术文献查询和分析系统,可以对全球学者进行文献检索和人才价值评估。用户可以通过关注某些学科领域的顶尖人物而持续追踪该领域的学科进展和研究前沿。经过近期的数据扩容,当前同舟云学术共收录了国内外主流学术期刊6万余种,收集的期刊论文及会议论文总量共计约1.5亿篇,并以每天添加12000余篇中外论文的速度递增。我们也可以为用户提供个性化、定制化的学者数据。欢迎来电咨询!咨询电话:010-8811{复制后删除}0370

www.globalauthorid.com

TOP

Copyright © 2019-2024 北京同舟云网络信息技术有限公司
京公网安备11010802033243号  京ICP备18003416号-3