IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Discretization for Continuous Attributes

Discretization for Continuous Attributes
View Sample PDF
Author(s): Fabrice Muhlenbach (EURISE, Université Jean Monnet - Saint-Etienne, France)and Ricco Rakotomalala (ERIC, Université Lumière - Lyon 2, France)
Copyright: 2005
Pages: 6
Source title: Encyclopedia of Data Warehousing and Mining
Source Author(s)/Editor(s): John Wang (Montclair State University, USA)
DOI: 10.4018/978-1-59140-557-3.ch076

Purchase

View Discretization for Continuous Attributes on the publisher's website for pricing and purchasing information.

Abstract

In the data-mining field, many learning methods — such as association rules, Bayesian networks, and induction rules (Grzymala-Busse & Stefanowski, 2001) — can handle only discrete attributes. Therefore, before the machine-learning process, it is necessary to re-encode each continuous attribute in a discrete attribute constituted by a set of intervals. For example, the age attribute can be transformed in two discrete values representing two intervals: less than 18 (a minor) and 18 or greater. This process, known as discretization, is an essential task of the data preprocessing not only because some learning methods do not handle continuous attributes, but also for other important reasons. The data transformed in a set of intervals are more cognitively relevant for a human interpretation (Liu, Hussain, Tan, & Dash, 2002); the computation process goes faster with a reduced level of data, particularly when some attributes are suppressed from the representation space of the learning problem if it is impossible to find a relevant cut (Mittal & Cheong, 2002); the discretization can provide nonlinear relations — for example, the infants and the elderly people are more sensitive to illness. This relation between age and illness is then not linear — which is why many authors propose to discretize the data even if the learning method can handle continuous attributes (Frank & Witten, 1999). Lastly, discretization can harmonize the nature of the data if it is heterogeneous — for example, in text categorization, the attributes are a mix of numerical values and occurrence terms (Macskassy, Hirsh, Banerjee, & Dayanik, 2001).

Related Content

Md Sakir Ahmed, Abhijit Bora. © 2024. 15 pages.
Lakshmi Haritha Medida, Kumar. © 2024. 18 pages.
Gypsy Nandi, Yadika Prasad. © 2024. 16 pages.
Saurav Bhattacharjee, Sabiha Raiyesha. © 2024. 14 pages.
Naren Kathirvel, Kathirvel Ayyaswamy, B. Santhoshi. © 2024. 26 pages.
K. Sudha, C. Balakrishnan, T. P. Anish, T. Nithya, B. Yamini, R. Siva Subramanian, M. Nalini. © 2024. 25 pages.
Sabiha Raiyesha, Papul Changmai. © 2024. 28 pages.
Body Bottom