Affiliation:
1. Consiglio Nazionale delle Ricerche, Pisa, Italy
2. ITC-irst, Povo (TN), Italy
Abstract
We discuss an approach to the automatic expansion of
domain-specific lexicons
, that is, to the problem of
extending, for each
c
i
in a predefined set
C
=
{
c
1
,…,
c
m
} of
semantic
domains
, an initial lexicon
L
i
0
into a larger lexicon
L
i
1
. Our approach relies on
term categorization
, defined as the task of labeling
previously unlabeled terms according to a predefined set of
domains. We approach this as a supervised learning problem in which
term classifiers are built using the initial lexicons as training
data. Dually to classic text categorization tasks in which
documents are represented as vectors in a space of terms, we
represent terms as vectors in a space of documents. We present the
results of a number of experiments in which we use a boosting-based
learning device for training our term classifiers. We test the
effectiveness of our method by using WordNetDomains, a well-known
large set of domain-specific lexicons, as a benchmark. Our
experiments are performed using the documents in the Reuters Corpus
Volume 1 as implicit representations for our terms.
Publisher
Association for Computing Machinery (ACM)
Subject
Computational Mathematics,Computer Science (miscellaneous)
Cited by
3 articles.
订阅此论文施引文献
订阅此论文施引文献,注册后可以免费订阅5篇论文的施引文献,订阅后可以查看论文全部施引文献