Data Quality in Data Warehouses

View Sample PDF

Author(s): William E. Winkler (U.S. Bureau of the Census, USA)
Copyright: 2005
Pages: 5
Source title: Encyclopedia of Data Warehousing and Mining
Source Author(s)/Editor(s): John Wang (Montclair State University, USA)
DOI: 10.4018/978-1-59140-557-3.ch058

Keywords: Data Mining and Databases / Data Warehousing / Information Science Reference / Library & Information Science

Purchase

View Data Quality in Data Warehouses on the publisher's website for pricing and purchasing information.

Abstract

Fayyad and Uthursamy (2002) have stated that the majority of the work (representing months or years) in creating a data warehouse is in cleaning up duplicates and resolving other anomalies. This article provides an overview of two methods for improving quality. The first is data cleaning for finding duplicates within files or across files. The second is edit/imputation for maintaining business rules and for filling in missing data. The fastest data-cleaning methods are suitable for files with hundreds of millions of records (Winkler, 1999b, 2003b). The fastest edit/imputation methods are suitable for files with millions of records (Winkler, 1999a, 2004b).

The IRMA Community

Research IRM

Data Quality in Data Warehouses

Purchase

Abstract

Related Content

IRMA Sponsors