Another stemmer-Reference-Cited by-同舟云学术

Another stemmer

Published:1990-11 Issue:3 Volume:24 Page:56-61
ISSN:0163-5840
Container-title:ACM SIGIR Forum
language:en
Short-container-title:SIGIR Forum

Author:

Paice Chris D.

Abstract

In natural language processing, conflation is the process of merging or lumping together nonidentical words which refer to the same principal concept. This can relate both to words which are entirely different in form (e.g., "group" and "collection"), and to words which share some common root (e.g., "group", "grouping", "subgroups"). In the former case the words can only be mapped by referring to a dictionary or thesaurus, but in the latter case use can be made of the orthographic similarities between the forms. One popular approach is to remove affixes from the input words, thus reducing them to a stem ; if this could be done correctly, all the variant forms of a word would be converted to the same standard form. Since the process is aimed at mapping for retrieval purposes, the stem need not be a linguistically correct lemma or root (see also Frakes 1982).

Publisher

Association for Computing Machinery (ACM)

Subject

Hardware and Architecture,Management Information Systems

Link

https://dl.acm.org/doi/pdf/10.1145/101306.101310

Reference7 articles.

1. Suffix removal and word conflation;Dawson J. L.;ALLC Bulletin,1974

2. An evaluation of some conflation algorithms for information retrieval

3. Development of a stemming algorithm;Lovins J. B.;Mechanical Translation and Computational Linguistics,1968

4. Paice C. D. 1977: Information Retrieval and the Computer London: MacDonald & Jane's 1977; chapter 4. Paice C. D. 1977: Information Retrieval and the Computer London: MacDonald & Jane's 1977; chapter 4.

Cited by 163 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. SUSTEM: An Improved Rule-based Sundanese Stemmer;ACM Transactions on Asian and Low-Resource Language Information Processing;2024-06-21

2. Is text preprocessing still worth the time? A comparative survey on the influence of popular preprocessing methods on Transformers and traditional classifiers;Information Systems;2024-03

3. Computationally Efficient Labeling of Cancer-Related Forum Posts by Non-clinical Text Information Retrieval;SN Computer Science;2023-09-18

4. A multi-label text message classification method designed for applications in call/contact centre systems;Applied Soft Computing;2023-09

5. Synthesizing Best Practices for Conducting Dictionary-Based Computerized Text Analysis Research;Methods to Improve Our Field;2023-01-18