IRMA-International.org: Creator of Knowledge
Information Resources Management Association
Advancing the Concepts & Practices of Information Resources Management in Modern Organizations

Data-Aware Distributed Computing

Data-Aware Distributed Computing
View Sample PDF
Author(s): Esma Yildirim (State University of New York at Buffalo (SUNY), USA), Mehmet Balman (Lawrence Berkeley National Laboratory, USA)and Tevfik Kosar (State University of New York at Buffalo (SUNY), USA)
Copyright: 2012
Pages: 27
Source title: Data Intensive Distributed Computing: Challenges and Solutions for Large-scale Information Management
Source Author(s)/Editor(s): Tevfik Kosar (University at Buffalo, USA)
DOI: 10.4018/978-1-61520-971-2.ch001

Purchase

View Data-Aware Distributed Computing on the publisher's website for pricing and purchasing information.

Abstract

With the continuous increase in the data requirements of scientific and commercial applications, access to remote and distributed data has become a major bottleneck for end-to-end application performance. Traditional distributed computing systems closely couple data access and computation, and generally, data access is considered a side effect of computation. The limitations of traditional distributed computing systems and CPU-oriented scheduling and workflow management tools in managing complex data handling have motivated a newly emerging era: data-aware distributed computing. In this chapter, the authors elaborate on how the most crucial distributed computing components, such as scheduling, workflow management, and end-to-end throughput optimization, can become “data-aware.” In this new computing paradigm, called data-aware distributed computing, data placement activities are represented as full-featured jobs in the end-to-end workflow, and they are queued, managed, scheduled, and optimized via a specialized data-aware scheduler. As part of this new paradigm, the authors present a set of tools for mitigating the data bottleneck in distributed computing systems, which consists of three main components: a data-aware scheduler, which provides capabilities such as planning, scheduling, resource reservation, job execution, and error recovery for data movement tasks; integration of these capabilities to the other layers in distributed computing, such as workflow planning; and further optimization of data movement tasks via dynamic tuning of underlying protocol transfer parameters.

Related Content

Radhika Kavuri, Satya kiranmai Tadepalli. © 2024. 19 pages.
Ramu Kuchipudi, Ramesh Babu Palamakula, T. Satyanarayana Murthy. © 2024. 10 pages.
Nidhi Niraj Worah, Megharani Patil. © 2024. 21 pages.
Vishal Goar, Nagendra Singh Yadav. © 2024. 23 pages.
S. Boopathi. © 2024. 24 pages.
Sai Samin Varma Pusapati. © 2024. 25 pages.
Swapna Mudrakola, Krishna Keerthi Chennam, Shitharth Selvarajan. © 2024. 11 pages.
Body Bottom