IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Enhancing Data Quality at ETL Stage of Data Warehousing

Enhancing Data Quality at ETL Stage of Data Warehousing
View Sample PDF
Author(s): Neha Gupta (Manav Rachna International Institute of Research and Studies, Faridabad, India)and Sakshi Jolly (Manav Rachna International Institute of Research and Studies, Faridabad, India)
Copyright: 2021
Volume: 17
Issue: 1
Pages: 18
Source title: International Journal of Data Warehousing and Mining (IJDWM)
Editor(s)-in-Chief: Eric Pardede (La Trobe University, Australia)and Kiki Adhinugraha (La Trobe University, Australia)
DOI: 10.4018/IJDWM.2021010105

Purchase

View Enhancing Data Quality at ETL Stage of Data Warehousing on the publisher's website for pricing and purchasing information.

Abstract

Data usually comes into data warehouses from multiple sources having different formats and are specifically categorized into three groups (i.e., structured, semi-structured, and unstructured). Various data mining technologies are used to collect, refine, and analyze the data which further leads to the problem of data quality management. Data purgation occurs when the data is subject to ETL methodology in order to maintain and improve the data quality. The data may contain unnecessary information and may have inappropriate symbols which can be defined as dummy values, cryptic values, or missing values. The present work has improved the expectation-maximization algorithm with dot product to handle cryptic data, DBSCAN method with Gower metrics to ensure dummy values, Wards algorithm with Minkowski distance to improve the results of contradicting data and K-means algorithm along with Euclidean distance metrics to handle missing values in a dataset. These distance metrics have improved the data quality and also helped in providing consistent data to be loaded into a data warehouse.

Related Content

Feiqi Liu, Dong Yang, Yuyang Zhang, Chengcai Yang, Jingjing Yang. © 2024. 19 pages.
Qiliang Zhu, Changsheng Wang, Wenchao Jin, Jianxun Ren, Xueting Yu. © 2024. 17 pages.
JianDong He. © 2024. 14 pages.
. © 2024.
Man Jiang, Qilong Han, Haitao Zhang, Hexiang Liu. © 2023. 15 pages.
Qiliang Zhu, Wenhao Ding, Mingsen Xiang, Mengzhen Hu, Ning Zhang. © 2023. 16 pages.
Ge Zhang, Zubin Ning. © 2023. 21 pages.
Body Bottom