The IRMA Community
Newsletters
Research IRM
Click a keyword to search titles using our InfoSci-OnDemand powered search:
|
Data Quality in Data Warehouses
|
Author(s): William E. Winkler (U.S. Bureau of the Census, USA)
Copyright: 2005
Pages: 5
Source title:
Encyclopedia of Data Warehousing and Mining
Source Author(s)/Editor(s): John Wang (Montclair State University, USA)
DOI: 10.4018/978-1-59140-557-3.ch058
Purchase
|
Abstract
Fayyad and Uthursamy (2002) have stated that the majority of the work (representing months or years) in creating a data warehouse is in cleaning up duplicates and resolving other anomalies. This article provides an overview of two methods for improving quality. The first is data cleaning for finding duplicates within files or across files. The second is edit/imputation for maintaining business rules and for filling in missing data. The fastest data-cleaning methods are suitable for files with hundreds of millions of records (Winkler, 1999b, 2003b). The fastest edit/imputation methods are suitable for files with millions of records (Winkler, 1999a, 2004b).
Related Content
Md Sakir Ahmed, Abhijit Bora.
© 2024.
15 pages.
|
Lakshmi Haritha Medida, Kumar.
© 2024.
18 pages.
|
Gypsy Nandi, Yadika Prasad.
© 2024.
16 pages.
|
Saurav Bhattacharjee, Sabiha Raiyesha.
© 2024.
14 pages.
|
Naren Kathirvel, Kathirvel Ayyaswamy, B. Santhoshi.
© 2024.
26 pages.
|
K. Sudha, C. Balakrishnan, T. P. Anish, T. Nithya, B. Yamini, R. Siva Subramanian, M. Nalini.
© 2024.
25 pages.
|
Sabiha Raiyesha, Papul Changmai.
© 2024.
28 pages.
|
|
|