Estimation of Distribution Algorithms for Feature Subset Selection in Large Dimensionality Domains

View Sample PDF

Author(s): Inaki Inza (University of the Basque Country, Spain), Pedro Larranaga (University of the Basque Country, Spain)and Basilio Sierra (University of the Basque Country, Spain)
Copyright: 2002
Pages: 20
Source title: Data Mining: A Heuristic Approach
Source Author(s)/Editor(s): Hussein A. Abbass (University of New South Wales, Australia), Ruhul Sarker (University of New South Wales, Australia)and Charles S. Newton (University of New South Wales, Australia)
DOI: 10.4018/978-1-930708-25-9.ch005

Keywords: Data Mining / Data Mining and Databases / Information Science Reference / Library & Information Science

Purchase

View Estimation of Distribution Algorithms for Feature Subset Selection in Large Dimensionality Domains on the publisher's website for pricing and purchasing information.

Abstract

Feature Subset Selection (FSS) is a well-known task of Machine Learning, Data Mining, Pattern Recognition or Text Learning paradigms. Genetic Algorithms (GAs) are possibly the most commonly used algorithms for Feature Subset Selection tasks. Although the FSS literature contains many papers, few of them tackle the task of FSS in domains with more than 50 features. In this chapter we present a novel search heuristic paradigm, called Estimation of Distribution Algorithms (EDAs), as an alternative to GAs, to perform a population-based and randomized search in datasets of a large dimensionality. The EDA paradigm avoids the use of genetic crossover and mutation operators to evolve the populations. In absence of these operators, the evolution is guaranteed by the factorization of the probability distribution of the best solutions found in a generation of the search and the subsequent simulation of this distribution to obtain a new pool of solutions. In this chapter we present four different probabilistic models to perform this factorization. In a comparison with two types of GAs in natural and artificial datasets of a large dimensionality, EDAbased approaches obtain encouraging results with regard to accuracy, and a fewer number of evaluations were needed than used in genetic approaches.

The IRMA Community

Research IRM

Estimation of Distribution Algorithms for Feature Subset Selection in Large Dimensionality Domains

Purchase

Abstract

Related Content

IRMA Sponsors