The IRMA Community
Newsletters
Research IRM
Click a keyword to search titles using our InfoSci-OnDemand powered search:
|
Misplacing the Code: An Examination of Data Quality Issues in Bayesian Text Classification for Automated Coding of Medical Diagnoses
Abstract
In this article we discuss the effect of dirty data on text mining for automated coding of medical diagnoses. Using two Bayesian machine learning algorithms (naive Bayes and shrinkage) we build ICD9-CM classification models trained from free-text diagnoses. We investigate the effect of training the classifiers using both clean and (simulated) dirty data. The research focuses on the impact that erroneous labeling of training data sets has on the classifiers’ predictive accuracy.
|
|