IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Applying the K-Means Algorithm in Big Raw Data Sets with Hadoop and MapReduce

Applying the K-Means Algorithm in Big Raw Data Sets with Hadoop and MapReduce
View Sample PDF
Author(s): Ilias K. Savvas (TEI of Larissa, Greece), Georgia N. Sofianidou (TEI of Larissa, Greece)and M-Tahar Kechadi (University College Dublin, Ireland)
Copyright: 2016
Pages: 24
Source title: Business Intelligence: Concepts, Methodologies, Tools, and Applications
Source Author(s)/Editor(s): Information Resources Management Association (USA)
DOI: 10.4018/978-1-4666-9562-7.ch062

Purchase

View Applying the K-Means Algorithm in Big Raw Data Sets with Hadoop and MapReduce on the publisher's website for pricing and purchasing information.

Abstract

Big data refers to data sets whose size is beyond the capabilities of most current hardware and software technologies. The Apache Hadoop software library is a framework for distributed processing of large data sets, while HDFS is a distributed file system that provides high-throughput access to data-driven applications, and MapReduce is software framework for distributed computing of large data sets. Huge collections of raw data require fast and accurate mining processes in order to extract useful knowledge. One of the most popular techniques of data mining is the K-means clustering algorithm. In this study, the authors develop a distributed version of the K-means algorithm using the MapReduce framework on the Hadoop Distributed File System. The theoretical and experimental results of the technique prove its efficiency; thus, HDFS and MapReduce can apply to big data with very promising results.

Related Content

Dina Darwish. © 2024. 48 pages.
Dina Darwish. © 2024. 51 pages.
Smrity Prasad, Kashvi Prawal. © 2024. 19 pages.
Jignesh Patil, Sharmila Rathod. © 2024. 17 pages.
Ganesh B. Regulwar, Ashish Mahalle, Raju Pawar, Swati K. Shamkuwar, Priti Roshan Kakde, Swati Tiwari. © 2024. 23 pages.
Pranali Dhawas, Abhishek Dhore, Dhananjay Bhagat, Ritu Dorlikar Pawar, Ashwini Kukade, Kamlesh Kalbande. © 2024. 24 pages.
Pranali Dhawas, Minakshi Ashok Ramteke, Aarti Thakur, Poonam Vijay Polshetwar, Ramadevi Vitthal Salunkhe, Dhananjay Bhagat. © 2024. 26 pages.
Body Bottom