IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Misplacing the Code: An Examination of Data Quality Issues in Bayesian Text Classification for Automated Coding of Medical Diagnoses

Misplacing the Code: An Examination of Data Quality Issues in Bayesian Text Classification for Automated Coding of Medical Diagnoses
View Free PDF
Author(s): Eitel J.M. Lauria (Marist College, USA) and Alan D. March (Universidad del Salvador, Argentina)
Copyright: 2007
Pages: 3
Source title: Managing Worldwide Operations and Communications with Information Technology
Source Editor(s): Mehdi Khosrow-Pour, D.B.A. (Information Resources Management Association, USA)
DOI: 10.4018/978-1-59904-929-8.ch296
ISBN13: 9781599049298
EISBN13: 9781466665378

Abstract

In this article we discuss the effect of dirty data on text mining for automated coding of medical diagnoses. Using two Bayesian machine learning algorithms (naive Bayes and shrinkage) we build ICD9-CM classification models trained from free-text diagnoses. We investigate the effect of training the classifiers using both clean and (simulated) dirty data. The research focuses on the impact that erroneous labeling of training data sets has on the classifiers’ predictive accuracy.

Body Bottom