Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Applying Data Mining Techniques to Improve Data Quality of Patient Records

Applying Data Mining Techniques to Improve Data Quality of Patient Records
View Free PDF
Author(s): Narasimhaiah Gorla (Wayne State University, USA)and Chow Y.K. Bennon (Hong Kong Polytechnic University, Hong Kong)
Copyright: 2002
Pages: 5
Source title: Issues & Trends of Information Technology Management in Contemporary Organizations
Source Editor(s): Mehdi Khosrow-Pour, D.B.A. (Information Resources Management Association, USA)
DOI: 10.4018/978-1-930708-39-6.ch087
ISBN13: 9781930708396
EISBN13: 9781466641358


Public hospitals are under the control and supervision of the Hospital Authority in Hong Kong. The demographic and clinical description of each patient is recorded in the databases of various hospital information systems. The errors in patient data result in erroneous conclusions by the doctors and lost time to resolve data errors. The reason for data errors are wrong entry of data, absence of information provided by the patient when they enter the hospital, improper identity of the patients (especially in case of tourists) etc. All these factors will lead to a phenomenon that several records of the same patient will be shown as records of different patients. In this research, we illustrate the use of “clustering” technique, a data mining technique, the hospital can use to group “similar” patients together. We use two algorithms: hierarchical clustering and partitioned clustering. Furthermore, we combined these two algorithms to generate “hybrid” clustering algorithm and applied on the patient data, using a C program. We used six attributes of patient data: Sex, DOB, Name, Marital Status, District, and Telephone number as the basis for similarity of patient records. We also used some weights to these variables in computing similarity. We found that the Hybrid algorithm gave more accurate grouping compared to the other algorithms, had smaller mean square error, and executed faster. Due to the privacy ordinance, the true data of patients will not be shown, but only simulated data will be used.

Body Bottom