THE VALUE OF AN IN-DOMAIN LEXICON IN GENOMICS QA-Reference-Cited by-同舟云学术

THE VALUE OF AN IN-DOMAIN LEXICON IN GENOMICS QA

Published:2010-02 Issue:01 Volume:08 Page:147-161
ISSN:0219-7200
Container-title:Journal of Bioinformatics and Computational Biology
language:en
Short-container-title:J. Bioinform. Comput. Biol.

Author:

SASAKI YUTAKA¹,MCNAUGHT JOHN¹,ANANIADOU SOPHIA¹

Affiliation:

1. National Centre for Text Mining, School of Computer Science, University of Manchester, MIB, 131 Princess Street, Manchester, M1 7DN, United Kingdom

Abstract

This paper demonstrates that a large-scale lexicon tailored for the biology domain is effective in improving question analysis for genomics Question Answering (QA). We use the TREC Genomics Track data to evaluate the performance of different question analysis methods. It is hard to process textual information in biology, especially in molecular biology, due to a huge number of technical terms which rarely appear in general English documents and dictionaries. To support biological Text Mining, we have developed a domain-specific resource, the BioLexicon. Started in 2006 from scratch, this lexicon currently includes more than four million biomedical terms consisting of newly curated terms and terms collected from existing biomedical databases. While conventional genomics QA systems provide query expansion based on thesauri and dictionaries, it is not clear to what extent a biology-oriented lexical resource is effective for question pre-processing for genomics QA. Experiments on the genomics QA data set show that question analysis using the BioLexicon performs slightly better than that using n-grams and the UMLS Specialist Lexicon.

Publisher

World Scientific Pub Co Pte Lt

Subject

Computer Science Applications,Molecular Biology,Biochemistry

Link

https://www.worldscientific.com/doi/pdf/10.1142/S0219720010004513

Reference3 articles.

1. Literature mining for the biologist: from information retrieval to biological discovery

2. MedPost: a part-of-speech tagger for bioMedical text

3. WordNet

Cited by 6 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Building a specialized lexicon for breast cancer clinical trial subject eligibility analysis;Health Informatics Journal;2021-01

2. Constructing a biodiversity terminological inventory;PLOS ONE;2017-04-17

3. Reconstructing Models from Proteomics Data;Computational Systems Neurobiology;2012

4. The BioLexicon: a large-scale terminological resource for biomedical text mining;BMC Bioinformatics;2011-10-12

5. NEW RESULTS IN BIOLOGICAL SEQUENCE ANALYSIS, COMPLEX GENE–DISEASE ASSOCIATION, qPCR CALCULATION, AND BIOLOGICAL TEXT MINING;Journal of Bioinformatics and Computational Biology;2010-10