IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Machine-Learning-Based External Plagiarism Detecting Methodology From Monolingual Documents: A Comparative Study

Machine-Learning-Based External Plagiarism Detecting Methodology From Monolingual Documents: A Comparative Study
View Sample PDF
Author(s): Saugata Bose (University of Liberal Arts Bangladesh, Bangladesh)and Ritambhra Korpal (Savitribai Phule Pune University, India)
Copyright: 2019
Pages: 17
Source title: Scholarly Ethics and Publishing: Breakthroughs in Research and Practice
Source Author(s)/Editor(s): Information Resources Management Association (USA)
DOI: 10.4018/978-1-5225-8057-7.ch021

Purchase


Abstract

In this chapter, an initiative is proposed where natural language processing (NLP) techniques and supervised machine learning algorithms have been combined to detect external plagiarism. The major emphasis is on to construct a framework to detect plagiarism from monolingual texts by implementing n-gram frequency comparison approach. The framework is based on 120 characteristics which have been extracted during pre-processing steps using simple NLP approach. Afterward, filter metrics has been applied to select most relevant features and supervised classification learning algorithm has been used later to classify the documents in four levels of plagiarism. Then, confusion matrix was built to estimate the false positives and false negatives. Finally, the authors have shown C4.5 decision tree-based classifier's suitability on calculating accuracy over naive Bayes. The framework achieved 89% accuracy with low false positive and false negative rate and it shows higher precision and recall value comparing to passage similarities method, sentence similarity method, and search space reduction method.

Related Content

Tutita M. Casa, Fabiana Cardetti, Madelyn W. Colonnese. © 2024. 14 pages.
R. Alex Smith, Madeline Day Price, Tessa L. Arsenault, Sarah R. Powell, Erin Smith, Michael Hebert. © 2024. 19 pages.
Marta T. Magiera, Mohammad Al-younes. © 2024. 27 pages.
Christopher Dennis Nazelli, S. Asli Özgün-Koca, Deborah Zopf. © 2024. 31 pages.
Ethan P. Smith. © 2024. 22 pages.
James P. Bywater, Sarah Lilly, Jennifer L. Chiu. © 2024. 20 pages.
Ian Jones, Jodie Hunter. © 2024. 20 pages.
Body Bottom