Affiliation:
1. University of Wisconsin-Milwaukee, Milwaukee, WI
2. National Center for Biotechnology Information, Bethesda, MD
3. University of Texas, Richardson, TX
Abstract
Abbreviations and acronyms are widely used in the biomedical literature and many of them represent important biomedical concepts. Because many abbreviations are ambiguous (e.g.,
CAT
denotes both
chloramphenicol acetyl transferase
and
computed axial tomography
, depending on the context), recognizing the full form associated with each abbreviation is in most cases equivalent to identifying the meaning of the abbreviation. This, in turn, allows us to perform more accurate natural language processing, information extraction, and retrieval. In this study, we have developed supervised approaches to identifying the full forms of ambiguous abbreviations within the context they appear. We first automatically assigned multiple possible full forms for each abbreviation; we then treated the in-context full-form prediction for each specific abbreviation occurrence as a case of word-sense disambiguation. We generated automatically a dictionary of all possible full forms for each abbreviation. We applied supervised machine-learning algorithms for disambiguation. Because some of the links between abbreviations and their corresponding full forms are explicitly given in the text and can be recovered automatically, we can use these explicit links to automatically provide training data for disambiguating the abbreviations that are not linked to a full form within a text. We evaluated our methods on over 150 thousand abstracts and obtain for coverage and precision results of 82% and 92%, respectively, when performed as tenfold cross-validation, and 79% and 80%, respectively, when evaluated against an external set of abstracts in which the abbreviations are not defined.
Publisher
Association for Computing Machinery (ACM)
Subject
Computer Science Applications,General Business, Management and Accounting,Information Systems
Reference50 articles.
1. Adar E. 2002. A simple and robust abbreviation dictionary. Tech. rep. H. P. Laboratories. Adar E. 2002. A simple and robust abbreviation dictionary. Tech. rep. H. P. Laboratories.
2. An empirical distribution function for sampling with incomplete information;Ayer M.;Ann. Meth. Statis.,1954
Cited by
12 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献