Affiliation:
1. Universitat Oberta de Catalunya
Abstract
Abstract
The identification of reliable terms from domain-specific corpora using
computational methods is a task that has to be validated manually by
specialists, which is a highly time-consuming activity. To reduce this effort
and improve term candidate selection, we implemented the Token Slot Recognition
method, a filtering method based on terminological tokens which is used to rank
extracted term candidates from domain-specific corpora. This paper presents the
implementation of the term candidates filtering method we developed in
linguistic and statistical approaches applied for automatic term extraction
using several domain-specific corpora in different languages. We observed that
the filtering method outperforms term candidate selection by ranking a higher
number of terms at the top of the term candidate list than raw frequency, and
for statistical term extraction the improvement is between 15% and 25% both in
precision and recall. Our analyses further revealed a reduction in the number of
term candidates to be validated manually by specialists. In conclusion, the
number of term candidates extracted automatically from domain-specific corpora
has been reduced significantly using the Token Slot Recognition filtering
method, so term candidates can be easily and quickly validated by
specialists.
Publisher
John Benjamins Publishing Company
Subject
Library and Information Sciences,Communication,Language and Linguistics
Reference67 articles.
1. A Computational Linguistic Approach to Automatic Term Recognition;Ananiadou,1994
2. A methodology for automatic term recognition
3. Term Extraction from Unrestricted Text;Arppe,1995
4. Improving Term Extraction with Terminological Resources
Cited by
5 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献