IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Minimum Database Determination and Preprocessing for Machine Learning

Minimum Database Determination and Preprocessing for Machine Learning
View Sample PDF
Author(s): Angel Fernando Kuri-Morales (ITAM, Mexico)
Copyright: 2019
Pages: 38
Source title: Innovative Solutions and Applications of Web Services Technology
Source Author(s)/Editor(s): Liang-Jie Zhang (Kingdee International Software Group Co., Ltd., China)and Yishuang Ning (Tsinghua University, China)
DOI: 10.4018/978-1-5225-7268-8.ch005

Purchase

View Minimum Database Determination and Preprocessing for Machine Learning on the publisher's website for pricing and purchasing information.

Abstract

The exploitation of large databases implies the investment of expensive resources both in terms of the storage and processing time. The correct assessment of the data implies that pre-processing steps be taken before its analysis. The transformation of categorical data by adequately encoding every instance of categorical variables is needed. Encoding must be implemented that preserves the actual patterns while avoiding the introduction of non-existing ones. The authors discuss CESAMO, an algorithm which allows us to statistically identify the pattern preserving codes. The resulting database is more economical and may encompass mixed databases. Thus, they obtain an optimal transformed representation that is considerably more compact without impairing its informational content. For the equivalence of the original (FD) and reduced data set (RD), they apply an algorithm that relies on a multivariate regression algorithm (AA). Through the combined application of CESAMO and AA, the equivalent behavior of both FD and RD may be guaranteed with a high degree of statistical certainty.

Related Content

Mohib Ullah, Arbab Waseem Abbas, Lala Rukh, Kamran Ullah, Muhammad Inam Ul Haq. © 2023. 25 pages.
Rafi Ullah Khan, Mohib Ullah, Bushra Shafi, Imran Ihsan. © 2023. 20 pages.
Rafi Ullah Khan, Mohib Ullah, Bushra Shafi. © 2023. 17 pages.
Shaukat Ali, Shah Khusro, Mumtaz Khan. © 2023. 34 pages.
Tayyaba Riaz, Iftikhar Alam. © 2023. 20 pages.
Ufuk Uçak, Gurkan Tuna. © 2023. 22 pages.
Muhammad Hamad, Altaf Hussain, Majida Khan Tareen. © 2023. 21 pages.
Body Bottom