Automatic expansion of domain-specific lexicons by term categorization-Reference-Cited by-同舟云学术

Automatic expansion of domain-specific lexicons by term categorization

Published:2006-05 Issue:1 Volume:3 Page:1-30
ISSN:1550-4875
Container-title:ACM Transactions on Speech and Language Processing
language:en
Short-container-title:ACM Trans. Speech Lang. Process.

Author:

Avancini Henri¹,Lavelli Alberto²,Sebastiani Fabrizio¹,Zanoli Roberto²

Affiliation:

1. Consiglio Nazionale delle Ricerche, Pisa, Italy

2. ITC-irst, Povo (TN), Italy

Abstract

We discuss an approach to the automatic expansion of domain-specific lexicons , that is, to the problem of extending, for each c i in a predefined set C = { c 1 ,…, c m } of semantic domains , an initial lexicon L i 0 into a larger lexicon L i 1 . Our approach relies on term categorization , defined as the task of labeling previously unlabeled terms according to a predefined set of domains. We approach this as a supervised learning problem in which term classifiers are built using the initial lexicons as training data. Dually to classic text categorization tasks in which documents are represented as vectors in a space of terms, we represent terms as vectors in a space of documents. We present the results of a number of experiments in which we use a boosting-based learning device for training our term classifiers. We test the effectiveness of our method by using WordNetDomains, a well-known large set of domain-specific lexicons, as a benchmark. Our experiments are performed using the documents in the Reuters Corpus Volume 1 as implicit representations for our terms.

Publisher

Association for Computing Machinery (ACM)

Subject

Computational Mathematics,Computer Science (miscellaneous)

Link

https://dl.acm.org/doi/pdf/10.1145/1138379.1138380

Reference49 articles.

1. Aone C. and Bennett S. W. 1996. Applying machine learning to anaphora resolution. In Connectionist Statistical and Symbolic Approaches to Learning for Natural Language Processing S. Wermter E. Riloff and G. Scheler Eds. Springer Verlag Heidelberg Germany 302--314. (Lecture Notes in Computer Science vol. 1040). Aone C. and Bennett S. W. 1996. Applying machine learning to anaphora resolution. In Connectionist Statistical and Symbolic Approaches to Learning for Natural Language Processing S. Wermter E. Riloff and G. Scheler Eds. Springer Verlag Heidelberg Germany 302--314. (Lecture Notes in Computer Science vol. 1040).

2. Internet categorization and search: A machine learning approach;Chen H.;J. Visual Comm. Image Represent. Special Issue on Digital Libraries,1996

3. An approach to the automatic construction of global thesauri

Cited by 3 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. LIWC-UD: Classifying Online Slang Terms into LIWC Categories;14th ACM Web Science Conference 2022;2022-06-26

2. Delineating knowledge management through lexical analysis – a retrospective;Aslib Journal of Information Management;2015-03-16

3. Using wavelet analysis for text categorization in digital libraries: a first experiment with Strathprints;International Journal on Digital Libraries;2012-01-27