The IRMA Community
Newsletters
Research IRM
Click a keyword to search titles using our InfoSci-OnDemand powered search:
|
Minimum Database Determination and Preprocessing for Machine Learning
Abstract
The exploitation of large databases implies the investment of expensive resources both in terms of the storage and processing time. The correct assessment of the data implies that pre-processing steps be taken before its analysis. The transformation of categorical data by adequately encoding every instance of categorical variables is needed. Encoding must be implemented that preserves the actual patterns while avoiding the introduction of non-existing ones. The authors discuss CESAMO, an algorithm which allows us to statistically identify the pattern preserving codes. The resulting database is more economical and may encompass mixed databases. Thus, they obtain an optimal transformed representation that is considerably more compact without impairing its informational content. For the equivalence of the original (FD) and reduced data set (RD), they apply an algorithm that relies on a multivariate regression algorithm (AA). Through the combined application of CESAMO and AA, the equivalent behavior of both FD and RD may be guaranteed with a high degree of statistical certainty.
Related Content
Mohib Ullah, Arbab Waseem Abbas, Lala Rukh, Kamran Ullah, Muhammad Inam Ul Haq.
© 2023.
25 pages.
|
Rafi Ullah Khan, Mohib Ullah, Bushra Shafi, Imran Ihsan.
© 2023.
20 pages.
|
Rafi Ullah Khan, Mohib Ullah, Bushra Shafi.
© 2023.
17 pages.
|
Shaukat Ali, Shah Khusro, Mumtaz Khan.
© 2023.
34 pages.
|
Tayyaba Riaz, Iftikhar Alam.
© 2023.
20 pages.
|
Ufuk Uçak, Gurkan Tuna.
© 2023.
22 pages.
|
Muhammad Hamad, Altaf Hussain, Majida Khan Tareen.
© 2023.
21 pages.
|
|
|