A Comparison of Revision Schemes for Cleaning Labeling Noise

View Sample PDF

Author(s): Chuck P. Lam (Lama Solutions LLC., USA)and David G. Stork (Ricoh Innovations, Inc., USA)
Copyright: 2008
Pages: 13
Source title: Mathematical Methods for Knowledge Discovery and Data Mining
Source Author(s)/Editor(s): Giovanni Felici (Consiglio Nazionale delle Richerche, Italy)and Carlo Vercellis (Politecnico di Milano, Italy)
DOI: 10.4018/978-1-59904-528-3.ch013

Keywords: Data Mining and Databases / Information Science Reference / Knowledge Discovery / Library & Information Science

Purchase

View A Comparison of Revision Schemes for Cleaning Labeling Noise on the publisher's website for pricing and purchasing information.

Abstract

Data quality is an important factor in building effective classifiers. One way to improve data quality is by cleaning labeling noise. Label cleaning can be divided into two stages. The first stage identifies samples with suspicious labels. The second stage processes the suspicious samples using some revision scheme. This chapter examines three such revision schemes: (1) removal of the suspicious samples, (2) automatic replacement of the suspicious labels to what the machine believes to be correct, and (3) escalation of the suspicious samples to a human supervisor for relabeling. Experimental and theoretical analyses show that only escalation is effective when the original labeling noise is very large or very small. Furthermore, for a wide range of situations, removal is better than automatic replacement.

The IRMA Community

Research IRM

A Comparison of Revision Schemes for Cleaning Labeling Noise

Purchase

Abstract

Related Content

IRMA Sponsors