IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

A Novel Approach for Ontology-Based Feature Vector Generation for Web Text Document Classification

A Novel Approach for Ontology-Based Feature Vector Generation for Web Text Document Classification
View Sample PDF
Author(s): Mohamed K. Elhadad (Computer Engineering Department, Military Technical College, Cairo, Egypt), Khaled M. Badran (Computer Engineering Department, Military Technical College, Cairo, Egypt)and Gouda I. Salama (Computer Engineering Department, Military Technical College, Cairo, Egypt)
Copyright: 2018
Volume: 6
Issue: 1
Pages: 10
Source title: International Journal of Software Innovation (IJSI)
Editor(s)-in-Chief: Roger Y. Lee (Central Michigan University, USA)and Lawrence Chung (The University of Texas at Dallas, USA)
DOI: 10.4018/IJSI.2018010101

Purchase

View A Novel Approach for Ontology-Based Feature Vector Generation for Web Text Document Classification on the publisher's website for pricing and purchasing information.

Abstract

The task of extracting the used feature vector in mining tasks (classification, clustering …etc.) is considered the most important task for enhancing the text processing capabilities. This paper proposes a novel approach to be used in building the feature vector used in web text document classification process; adding semantics in the generated feature vector. This approach is based on utilizing the benefit of the hierarchal structure of the WordNet ontology, to eliminate meaningless words from the generated feature vector that has no semantic relation with any of WordNet lexical categories; this leads to the reduction of the feature vector size without losing information on the text, also enriching the feature vector by concatenating each word with its corresponding WordNet lexical category. For mining tasks, the Vector Space Model (VSM) is used to represent text documents and the Term Frequency Inverse Document Frequency (TFIDF) is used as a term weighting technique. The proposed ontology based approach was evaluated against the Principal component analysis (PCA) approach, and against an ontology based reduction technique without the process of adding semantics to the generated feature vector using several experiments with five different classifiers (SVM, JRIP, J48, Naive-Bayes, and kNN). The experimental results reveal the effectiveness of the authors' proposed approach against other traditional approaches to achieve a better classification accuracy F-measure, precision, and recall.

Related Content

Yogesh M. Kamble, Raj B. Kulkarni. © 2024. 10 pages.
Zachary Estreito, Vinh Le, Frederick C. Harris Jr., Sergiu M. Dascalu. © 2024. 15 pages.
Chase D. Carthen, Araam Zaremehrjardi, Vinh Le, Carlos Cardillo, Scotty Strachan, Alireza Tavakkoli, Frederick C. Harris Jr., Sergiu M. Dascalu. © 2024. 14 pages.
Partha Ghosh, Takaaki Goto, Leena Jana Ghosh, Giridhar Maji, Soumya Sen. © 2024. 15 pages.
Megha Bhushan, Utkarsh Verma, Chetna Garg, Arun Negi. © 2024. 14 pages.
Kuo Jong-Yih, Hsieh Ti-Feng, Lin Yu-De, Lin Hui-Chi. © 2024. 17 pages.
. © 2024.
Body Bottom