IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

How Size Matters: The Role of Sampling in Data Mining

How Size Matters: The Role of Sampling in Data Mining
View Sample PDF
Author(s): Paul D. Scott (University of Essex, UK)
Copyright: 2002
Pages: 20
Source title: Heuristic and Optimization for Knowledge Discovery
Source Author(s)/Editor(s): Hussein A. Abbass (University of New South Wales, Australia), Charles S. Newton (University of New South Wales, Australia)and Ruhul Sarker (University of New South Wales, Australia)
DOI: 10.4018/978-1-930708-26-6.ch008

Purchase

View How Size Matters: The Role of Sampling in Data Mining on the publisher's website for pricing and purchasing information.

Abstract

This chapter addresses the question of how to decide how large a sample is necessary in order to apply a particular data mining procedure to a given data set. A brief review of the main results of basic sampling theory is followed by a detailed consideration and comparison of the impact of simple random sample size on two well-known data mining procedures: naïve Bayes classifiers and decision tree induction. It is shown that both the learning procedure and the data set have a major impact on the size of sample required but that the size of the data set itself has little effect. The next section introduces a more sophisticated form of sampling, disproportionate stratification, and shows how it may be used to make much more effective use of limited processing resources. This section also includes a discussion of dynamic and static sampling. An examination of the impact of target function complexity concludes that neither target function complexity nor size of the attribute tuple space need be considered explicitly in determining sample size. The chapter concludes with a summary of the major results, a consideration of their relevance for small data sets and some brief remarks on the role of sampling for other data mining procedures.

Related Content

Murray Eugene Jennex. © 2020. 29 pages.
Ronald John Lofaro. © 2020. 18 pages.
Mark E. Nissen. © 2020. 23 pages.
Ronel Davel, Adeline S. A. Du Toit, Martie Mearns. © 2020. 32 pages.
Murray Eugene Jennex. © 2020. 23 pages.
Michael J. Zhang. © 2020. 21 pages.
Toshali Dey, Susmita Mukhopadhyay. © 2020. 23 pages.
Body Bottom