Towards Data Intensive Many-Task Computing

View Sample PDF

Author(s): Ioan Raicu (Illinois Institute of Technology, USA & Argonne National Laboratory, USA), Ian Foster (University of Chicago, USA & Argonne National Laboratory, USA), Yong Zhao (University of Electronic Science and Technology of China, China), Alex Szalay (Johns Hopkins University, USA), Philip Little (University of Notre Dame, USA), Christopher M. Moretti (University of Notre Dame, USA), Amitabh Chaudhary (University of Notre Dame, USA)and Douglas Thain (University of Notre Dame, USA)
Copyright: 2012
Pages: 46
Source title: Data Intensive Distributed Computing: Challenges and Solutions for Large-scale Information Management
Source Author(s)/Editor(s): Tevfik Kosar (University at Buffalo, USA)
DOI: 10.4018/978-1-61520-971-2.ch002

Keywords: Computer Science & IT / Grid & High Performance Computing / High Performance Computing / Information Science Reference

Purchase

View Towards Data Intensive Many-Task Computing on the publisher's website for pricing and purchasing information.

Abstract

Many-task computing aims to bridge the gap between two computing paradigms, high throughput computing and high performance computing. Traditional techniques to support many-task computing commonly found in scientific computing (i.e. the reliance on parallel file systems with static configurations) do not scale to today’s largest systems for data intensive application, as the rate of increase in the number of processors per system is outgrowing the rate of performance increase of parallel file systems. In this chapter, the authors argue that in such circumstances, data locality is critical to the successful and efficient use of large distributed systems for data-intensive applications. They propose a “data diffusion” approach to enable data-intensive many-task computing. They define an abstract model for data diffusion, define and implement scheduling policies with heuristics that optimize real world performance, and develop a competitive online caching eviction policy. They also offer many empirical experiments to explore the benefits of data diffusion, both under static and dynamic resource provisioning, demonstrating approaches that improve both performance and scalability.

The IRMA Community

Research IRM

Towards Data Intensive Many-Task Computing

Purchase

Abstract

Related Content

IRMA Sponsors