Cluster Analysis in Fitting Mixtures of Curves

View Sample PDF

Author(s): Tom Burr (Los Alamos National Laboratory, USA)
Copyright: 2005
Pages: 5
Source title: Encyclopedia of Data Warehousing and Mining
Source Author(s)/Editor(s): John Wang (Montclair State University, USA)
DOI: 10.4018/978-1-59140-557-3.ch030

Keywords: Data Mining and Databases / Data Warehousing / Information Science Reference / Library & Information Science

Purchase

View Cluster Analysis in Fitting Mixtures of Curves on the publisher's website for pricing and purchasing information.

Abstract

One data mining activity is cluster analysis, of which there are several types. One type deserving special attention is clustering that arises due to a mixture of curves. A mixture distribution is a combination of two or more distributions. For example, a bimodal distribution could be a mix with 30% of the values generated from one unimodal distribution and 70% of the values generated from a second unimodal distribution. The special type of mixture we consider here is a mixture of curves in a two-dimensional scatter plot. Imagine a collection of hundreds or thousands of scatter plots, each containing a few hundred points, including background noise, but also containing from zero to four or five bands of points, each having a curved shape. In a recent application (Burr et al., 2001), each curved band of points was a potential thunderstorm event (see Figure 1), as observed from a distant satellite, and the goal was to cluster the points into groups associated with thunderstorm events. Each curve has its own shape, length, and location, with varying degrees of curve overlap, point density, and noise magnitude. The scatter plots of points from curves having small noise resemble a smooth curve with very little vertical variation from the curve, but there can be a wide range in noise magnitude so that some events have large vertical variation from the center of the band. In this context, each curve is a cluster and the challenge is to use only the observations to estimate how many curves comprise the mixture, plus their shapes and locations. To achieve that goal, the human eye could train a classifier by providing cluster labels to all points in example scatter plots. Each point either would belong to a curved region or to a catch-all noise category, and a specialized cluster analysis would be used to develop an approach for labeling (clustering) the points generated according to the same mechanism in future scatter plots

The IRMA Community

Research IRM

Cluster Analysis in Fitting Mixtures of Curves

Purchase

Abstract

Related Content

IRMA Sponsors