Partially Supervised Text Categorization

View Sample PDF

Author(s): Xiao-Li Li (Institute for Infocomm Research, A* STAR, Singapore)
Copyright: 2009
Pages: 21
Source title: Handbook of Research on Text and Web Mining Technologies
Source Author(s)/Editor(s): Min Song (New Jersey Institute of Technology, USA)and Yi-Fang Brook Wu (New Jersey Institute of Technology, USA)
DOI: 10.4018/978-1-59904-990-8.ch005

Keywords: Data Mining / Data Mining and Databases / Information Science Reference / Library & Information Science

Purchase

View Partially Supervised Text Categorization on the publisher's website for pricing and purchasing information.

Abstract

In traditional text categorization, a classifier is built using labeled training documents from a set of predefined classes. This chapter studies a different problem: partially supervised text categorization. Given a set P of positive documents of a particular class and a set U of unlabeled documents (which contains both hidden positive and hidden negative documents), we build a classifier using P and U to classify the data in U as well as future test data. The key feature of this problem is that there is no labeled negative document, which makes traditional text classification techniques inapplicable. In this chapter, we introduce the main techniques S-EM, PEBL, Roc-SVM and A-EM, to solve the partially supervised problem. In many application domains, partially supervised text categorization is preferred since it saves on the labor-intensive effort of manual labeling of negative documents.

The IRMA Community

Research IRM

Partially Supervised Text Categorization

Purchase

Abstract

Related Content

IRMA Sponsors