IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Combining Data Preprocessing Methods With Imputation Techniques for Software Defect Prediction

Combining Data Preprocessing Methods With Imputation Techniques for Software Defect Prediction
View Sample PDF
Author(s): Misha Kakkar (Amity University Uttar Pradesh, Noida, India), Sarika Jain (Amity Institute of Information Technology, Amity University Uttar Pradesh, Noida, India), Abhay Bansal (Department of Computer Science and Engineering, Amity University Uttar Pradesh, Noida, India)and P.S. Grover (KIIT Group of Colleges, Gurgaon, India)
Copyright: 2021
Pages: 20
Source title: Research Anthology on Recent Trends, Tools, and Implications of Computer Programming
Source Author(s)/Editor(s): Information Resources Management Association (USA)
DOI: 10.4018/978-1-7998-3016-0.ch081

Purchase

View Combining Data Preprocessing Methods With Imputation Techniques for Software Defect Prediction on the publisher's website for pricing and purchasing information.

Abstract

Software Defect Prediction (SDP) models are used to predict, whether software is clean or buggy using the historical data collected from various software repositories. The data collected from such repositories may contain some missing values. In order to estimate missing values, imputation techniques are used, which utilizes the complete observed values in the dataset. The objective of this study is to identify the best-suited imputation technique for handling missing values in SDP dataset. In addition to identifying the imputation technique, the authors have investigated for the most appropriate combination of imputation technique and data preprocessing method for building SDP model. In this study, four combinations of imputation technique and data preprocessing methods are examined using the improved NASA datasets. These combinations are used along with five different machine-learning algorithms to develop models. The performance of these SDP models are then compared using traditional performance indicators. Experiment results show that among different imputation techniques, linear regression gives the most accurate imputed value. The combination of linear regression with correlation based feature selector outperforms all other combinations. To validate the significance of data preprocessing methods with imputation the findings are applied to open source projects. It was concluded that the result is in consistency with the above conclusion.

Related Content

Preethi, Sapna R., Mohammed Mujeer Ulla. © 2023. 16 pages.
Srividya P.. © 2023. 12 pages.
Preeti Sahu. © 2023. 15 pages.
Vandana Niranjan. © 2023. 23 pages.
S. Darwin, E. Fantin Irudaya Raj, M. Appadurai, M. Chithambara Thanu. © 2023. 33 pages.
Shankara Murthy H. M., Niranjana Rai, Ramakrishna N. Hegde. © 2023. 23 pages.
Jothimani K., Bhagya Jyothi K. L.. © 2023. 19 pages.
Body Bottom