A parametric methodology for text classification-Reference-Cited by-同舟云学术

A parametric methodology for text classification

Published:2010-06-28 Issue:4 Volume:36 Page:421-442
ISSN:0165-5515
Container-title:Journal of Information Science
language:en
Short-container-title:Journal of Information Science

Author:

Karanikolas Nikitas N.¹,Skourlas Christos²

Affiliation:

1. Department of Informatics, Technological Educational Institute (TEI) of Athens, Athens, Greece,

2. Department of Informatics, Technological Educational Institute (TEI) of Athens, Athens, Greece

Abstract

Finding the correct category (class) a new unclassified document belongs to is an interesting and difficult problem, with a wide range of applications. Our methodology for narrative text classification is based on two techniques: we calculate the distance (similarity) between the new unclassified document and all the pre-classified documents of each class and also calculate the similarity of the new document to the ‘average class document’ of each class. In both cases we use key phrases (text phrases or key terms) as the distinctive features of our text classification methodology and eventually the proposed text classification method is based on the automatic extraction of an authority list of key phrases that is appropriate for discriminating between different classes. In this paper, we apply this methodology in handling Greek text and we present the key concepts, the algorithms, and some critical decisions. A number of parameters of the mining algorithm are also fine tuned. The actual text classification system, the adopted (embedded) ideas and the alternative values of parameters are evaluated using two training sets (test collections).

Publisher

SAGE Publications

Subject

Library and Information Sciences,Information Systems

Link

http://journals.sagepub.com/doi/pdf/10.1177/0165551510368620

Reference40 articles.

1. Techniques of document management: a review of text retrieval and related technologies

2. Computer assisted information resources navigation

3. A document retrieval system based on nearest neighbour searching

4. Psychiatric Consultation Record Retrieval Using Scenario-Based Representation and Multilevel Mixture Model

Cited by 12 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. Online learning agents for cost-sensitive topical data acquisition from the web;Intelligent Data Analysis;2022-04-18

2. Use of linguistic forms mining in the link analysis of legal documents;Computer Science and Information Systems;2018

3. Research Publication Recommendation System based on a Hybrid Approach;Proceedings of the 20th Pan-Hellenic Conference on Informatics;2016-11-10

4. A kernel-based centroid classifier using hypothesis margin;Journal of Experimental & Theoretical Artificial Intelligence;2015-12

5. Supervised learning for building stemmers;Journal of Information Science;2015-03-06