Author:
Hu Qinmin,Huang Jimmy Xiangji,Miao Jun
Abstract
Abstract
Background
The users desire to be provided short, specific answers to questions and put them in context by linking original sources from the biomedical literature. Through the use of information retrieval technologies, information systems retrieve information to index data based on all kinds of pre-defined searching techniques/functions such that various ranking strategies are designed depending on different sources. In this paper, we propose a robust approach to optimizing multi-source information for improving genomics retrieval performance.
Results
In the proposed approach, we first consider a common scenario for a metasearch system that has access to multiple baselines with retrieving and ranking documents/passages by their own models. Then, given selected baselines from multiple sources, we investigate three modified fusion methods in the proposed approach, reciprocal, CombMNZ and CombSUM, to re-rank the candidates as the outputs for evaluation. Our empirical study on both 2007 and 2006 genomics data sets demonstrates the viability of the proposed approach for obtaining better performance. Furthermore, the experimental results show that the reciprocal method provides notable improvements on the individual baseline, especially on the passage2-level MAP and the aspect-level MAP.
Conclusions
From the extensive experiments on two TREC genomics data sets, we draw the following conclusions. For the three fusion methods proposed in the robust approach, the reciprocal method outperforms the CombMNZ and CombSUM methods obviously, and CombSUM works well on the passage2-level when compared with CombMNZ. Based on the multiple sources of DFR, BM25 and language model, we can observe that the alliance of giants achieves the best result. Meanwhile, under the same combination, the better the baseline performance is, the more contribution the baseline provides. These conclusions are very useful to direct the fusion work in the field of biomedical information retrieval.
Publisher
Springer Science and Business Media LLC
Subject
Applied Mathematics,Computer Science Applications,Molecular Biology,Biochemistry,Structural Biology
Reference28 articles.
1. Hersh W, Cohen AM, Roberts P: TREC 2006 Genomics Track Overview. In Proceedings of 15th Text REtrieval Conference. NIST Special Publication; 2006.
2. Hersh W, Cohen AM, Roberts P: TREC 2007 Genomics Track Overview. In Proceedings of 16th Text REtrieval Conference. NIST Special Publication; 2007.
3. Salton G, Fox EA, Wu H: Extended Boolean information retrieval. In Commun. Volume 26. ACM; 1983:1022–1036.
4. Robertson SE, Sparck J: Relevance Weighting of Search Terms. JASIS 1976, 27(3):129–146. 10.1002/asi.4630270302
5. Robertson SE, Walker S: Some Simple Effective Approximations to the 2-Poisson Model for Probabilistic Weighted Retrieval. In Proceedings of the 17th An nual International ACM SIGIR Conference on Research and Development in Information Retrieval, 3–6 July 1994, Dublin, Ireland. ACM/Springer; 1994:232–241.
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献