IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Latent Dirichlet Allocation and POS Tags Based Method for External Plagiarism Detection: LDA and POS Tags Based Plagiarism Detection

Latent Dirichlet Allocation and POS Tags Based Method for External Plagiarism Detection: LDA and POS Tags Based Plagiarism Detection
View Sample PDF
Author(s): Ali Daud (King Abdulaziz University, Saudi Arabia & International Islamic University Islamabad, Pakistan), Jamal Ahmad Khan (International Islamic University Islamabad, Pakistan), Jamal Abdul Nasir (International Islamic University Islamabad, Pakistan), Rabeeh Ayaz Abbasi (King Abdulaziz University, Saudi Arabia & Quaid-i-Azam University, Pakistan), Naif Radi Aljohani (King Abdulaziz University, Saudi Arabia)and Jalal S. Alowibdi (Faculty of Computing and Information Technology, University of Jeddah, Saudi Arabia)
Copyright: 2019
Pages: 18
Source title: Scholarly Ethics and Publishing: Breakthroughs in Research and Practice
Source Author(s)/Editor(s): Information Resources Management Association (USA)
DOI: 10.4018/978-1-5225-8057-7.ch015

Purchase


Abstract

In this article we present a new semantic and syntactic-based method for external plagiarism detection. In the proposed approach, latent dirichlet allocation (LDA) and parts of speech (POS) tags are used together to detect plagiarism between the sample and a number of source documents. The basic hypothesis is that considering semantic and syntactic information between two text documents may improve the performance of the plagiarism detection task. Our method is based on two steps, naming, which is a pre-processing where we detect the topics from the sentences in documents using the LDA and convert each sentence in POS tags array; then a post processing step where the suspicious cases are verified purely on the basis of semantic rules. For two types of external plagiarism (copy and random obfuscation), we empirically compare our approach to the state-of-the-art N-gram based and stop-word N-gram based methods and observe significant improvements.

Related Content

Tutita M. Casa, Fabiana Cardetti, Madelyn W. Colonnese. © 2024. 14 pages.
R. Alex Smith, Madeline Day Price, Tessa L. Arsenault, Sarah R. Powell, Erin Smith, Michael Hebert. © 2024. 19 pages.
Marta T. Magiera, Mohammad Al-younes. © 2024. 27 pages.
Christopher Dennis Nazelli, S. Asli Özgün-Koca, Deborah Zopf. © 2024. 31 pages.
Ethan P. Smith. © 2024. 22 pages.
James P. Bywater, Sarah Lilly, Jennifer L. Chiu. © 2024. 20 pages.
Ian Jones, Jodie Hunter. © 2024. 20 pages.
Body Bottom