A Novel Approach for Ontology-Based Feature Vector Generation for Web Text Document Classification-Reference-Cited by-同舟云学术

A Novel Approach for Ontology-Based Feature Vector Generation for Web Text Document Classification

Published:2018-01 Issue:1 Volume:6 Page:1-10
ISSN:2166-7160
Container-title:International Journal of Software Innovation
language:en
Short-container-title:

Author:

Elhadad Mohamed K.¹,Badran Khaled M.¹,Salama Gouda I.¹

Affiliation:

1. Computer Engineering Department, Military Technical College, Cairo, Egypt

Abstract

The task of extracting the used feature vector in mining tasks (classification, clustering …etc.) is considered the most important task for enhancing the text processing capabilities. This paper proposes a novel approach to be used in building the feature vector used in web text document classification process; adding semantics in the generated feature vector. This approach is based on utilizing the benefit of the hierarchal structure of the WordNet ontology, to eliminate meaningless words from the generated feature vector that has no semantic relation with any of WordNet lexical categories; this leads to the reduction of the feature vector size without losing information on the text, also enriching the feature vector by concatenating each word with its corresponding WordNet lexical category. For mining tasks, the Vector Space Model (VSM) is used to represent text documents and the Term Frequency Inverse Document Frequency (TFIDF) is used as a term weighting technique. The proposed ontology based approach was evaluated against the Principal component analysis (PCA) approach, and against an ontology based reduction technique without the process of adding semantics to the generated feature vector using several experiments with five different classifiers (SVM, JRIP, J48, Naive-Bayes, and kNN). The experimental results reveal the effectiveness of the authors' proposed approach against other traditional approaches to achieve a better classification accuracy F-measure, precision, and recall.

Publisher

IGI Global

Subject

Artificial Intelligence,Computer Graphics and Computer-Aided Design,Computer Networks and Communications,Computer Science Applications,Software

Reference23 articles.

1. Abdullah Bawakid, M. (2010). A semantic-based text classification system. In Proceedings of the 2010 IEEE 9th International Conference on Cybernetic Intelligent Systems, Reading, UK.

2. A Survey on Semantic Similarity Measure.;S.Anitha;International Journal of Research in Advent Technology,2014

3. An Overview of E-Documents Classification.;B. B.Aurangzeb Khan;International Conference on Machine Learning and Computing,2011

4. Daviddlewis. (n. d.). The Reuters dataset is available to be downloaded in sgml format from. Retrieved January 12, 2017, from http://www.daviddlewis.com/ressources/testcollections/reuters21578/

Cited by 10 articles. 订阅此论文施引文献订阅此论文施引文献，注册后可以免费订阅5篇论文的施引文献，订阅后可以查看论文全部施引文献

1. An Approach to Ontology-Based Smart Search in E-commerce;Open Semantic Technologies for Intelligent Systems;2022

2. Topic2features: a novel framework to classify noisy and sparse textual data using LDA topic distributions;PeerJ Computer Science;2021-08-11

3. Identifying and Characterizing the Propagation Scale of COVID-19 Situational Information on Twitter: A Hybrid Text Analytic Approach;Applied Sciences;2021-07-15

4. Knowledge organization of node enterprises’ technological innovation under supply chain environment;Complex & Intelligent Systems;2021-05-12

5. tax2vec: Constructing Interpretable Features from Taxonomies for Short Text Classification;Computer Speech & Language;2021-01