A Novel Approach for Ontology-Based Feature Vector Generation for Web Text Document Classification

View Sample PDF

Author(s): Mohamed K. Elhadad (Computer Engineering Department, Military Technical College, Cairo, Egypt), Khaled M. Badran (Computer Engineering Department, Military Technical College, Cairo, Egypt)and Gouda I. Salama (Computer Engineering Department, Military Technical College, Cairo, Egypt)
Copyright: 2018
Volume: 6
Issue: 1
Pages: 10
Source title: International Journal of Software Innovation (IJSI)
Editor(s)-in-Chief: Roger Y. Lee (Central Michigan University, USA)and Lawrence Chung (The University of Texas at Dallas, USA)
DOI: 10.4018/IJSI.2018010101

Keywords: Computer Science & IT / Information Science Reference / Systems & Software Design / Systems and Software Engineering

Purchase

View A Novel Approach for Ontology-Based Feature Vector Generation for Web Text Document Classification on the publisher's website for pricing and purchasing information.

Abstract

The task of extracting the used feature vector in mining tasks (classification, clustering …etc.) is considered the most important task for enhancing the text processing capabilities. This paper proposes a novel approach to be used in building the feature vector used in web text document classification process; adding semantics in the generated feature vector. This approach is based on utilizing the benefit of the hierarchal structure of the WordNet ontology, to eliminate meaningless words from the generated feature vector that has no semantic relation with any of WordNet lexical categories; this leads to the reduction of the feature vector size without losing information on the text, also enriching the feature vector by concatenating each word with its corresponding WordNet lexical category. For mining tasks, the Vector Space Model (VSM) is used to represent text documents and the Term Frequency Inverse Document Frequency (TFIDF) is used as a term weighting technique. The proposed ontology based approach was evaluated against the Principal component analysis (PCA) approach, and against an ontology based reduction technique without the process of adding semantics to the generated feature vector using several experiments with five different classifiers (SVM, JRIP, J48, Naive-Bayes, and kNN). The experimental results reveal the effectiveness of the authors' proposed approach against other traditional approaches to achieve a better classification accuracy F-measure, precision, and recall.

The IRMA Community

Research IRM

A Novel Approach for Ontology-Based Feature Vector Generation for Web Text Document Classification

Purchase

Abstract

Related Content

IRMA Sponsors