2008 • In Huang, Zhiyi; Xu, Zhiwei; Rountree, Nathanet al. (Eds.) Ninth International Conference On Parallel And Distributed Computing, Applications And Technologies : PDCAT 2008
[en] In this paper, the deployment and execution of Iterative Stencil applications on a P2P Grid middleware are investigated. So-called Iterative Stencil applications are composed of sets of heavily-communicating, long-running Tasks. They thus require co-allocation of multiple reliable resources for extended periods of
time.
P2P Grids are totally decentralized and provide on-demand, transparent access to edge resources, e.g. Internet-connected, non-dedicated desktop computers. A P2P Grid has the potential to provide access to a large number of resources at the fraction of the cost of a dedicated cluster. However, edge resources are heterogeneous in performance and intrinsically unreliable: Task execution failures are common due to resource preemption or resource failure. Furthermore, P2P Grid schedulers usually target sets of independent computational Tasks, i.e. so-called Bags of Tasks applications. It is therefore not trivial to deploy and run an Iterative Stencil application on a P2P Grid.
Checkpointing is a common fault-tolerance mechanism in High Performance Distributed Computing, often based on a centralized architecture. Locality-aware co-allocation in P2P Grids has been recently investigated. Checkpointing and locality-aware co-allocation yet have to be integrated in P2P Grids.
We propose to provide co-allocation through an existing middleware-level Bag of Tasks scheduling mechanism. We also introduce a layer of fault-tolerance for the Iterative Stencils that relies on a scalable, application-level, P2P checkpointing mechanism. Finally, LBG-SQUARE is described. This software results from the combination of a specific Iterative Stencil application (a Computational Fluid Dynamics simulation software called LaBoGrid) with a P2P Grid middleware (Lightweight Bartering Grid).
Disciplines :
Computer science
Author, co-author :
Dethier, Gérard ; Université de Liège - ULiège > Département de chimie appliquée > Génie chimique - Opérations physiques unitaires
Briquet, Cyril ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Informatique (ingénierie du logiciel et algorithmique)
Marchot, Pierre ; Université de Liège - ULiège > Département de chimie appliquée > Génie chimique - Systèmes polyphasiques
de Marneffe, Pierre-Arnoul ; Université de Liège - ULiège > Dép. d'électric., électron. et informat. (Inst.Montefiore) > Informatique (ingénierie du logiciel et algorithmique)
Language :
English
Title :
LBG-SQUARE - Fault Tolerant, Locality-Aware Co-allocation in P2P Grids
Publication date :
December 2008
Event name :
Ninth International Conference on Parallel and Distributed Computing, Applications and Technologies (PDCAT'08)
Event organizer :
University of Otago
Event place :
Dunedin, New Zealand
Event date :
1–4 of December, 2008
Audience :
International
Main work title :
Ninth International Conference On Parallel And Distributed Computing, Applications And Technologies : PDCAT 2008
Editor :
Huang, Zhiyi
Xu, Zhiwei
Rountree, Nathan
Lefevre, Laurent
Shen, Hong
Hine, John
Pan, Yi
Publisher :
IEEE Computer Society, Los Alamitos, United States - California
C. Banino-Rokkones. Algorithmic and Scheduling Techniques for Heterogeneous and Distributed Computing. PhD thesis, Norwegian University of Science and Technology, Trondheim, Norway, March 2007.
C. Briquet and P.-A. de Marneffe. Description of a Lightweight Bartering Grid Architecture. In Proc. Cracow Grid Workshop, Cracow, Poland, 2006.
R. Buyya, D. Abramson, and S. Venugopal. The Grid Economy. In M. Parashar and C. Lee, editors, Proc. of the IEEE, Special Issue on Grid Computing, volume 93, pages 698-714. IEEE Press, NY, USA, March 2005.
W. Cirne, F. Brasileiro, N. Andrade, L. B. Costa, A. Andrade, R. Novaes, and M. Mowbray. Labs of the World, Unite!!! In J. Grid Computing. Springer, 2006.
G. Dethier, C. Briquet, P. Marchot, and P.-A. de Marneffe. A Grid-enabled Lattice-Boltzmann-based modelling system. In Proc. PPAM, Gdansk, Poland, 2007.
N. Drost, R. V. van Nieuwpoort, and H. E. Bal. Simple Locality-Aware Co-allocation in Peer-to-Peer Supercomputing. In Proc. GP2P, Singapore, May 2006.
C. Engelmann and A. Geist. Super-Scalable Algorithms for Computing on 100,000 Processors. In Proc. ICCS, Atlanta, GA, USA, May 2005.
C. Engelmann and G. Geist. A Diskless Checkpointing Algorithm for Super-scale Architectures Applied to the Fast Fourier Transform. In Proc. CLADE'03, HPDC Workshops, Seattle, WA, USA, June 2003.
J. S. Plank, Y. Kim, and J. J. Dongarra. Fault Tolerant Matrix Operations for Networks of Workstations Using Diskless Checkpointing. In J. Parallel and Distributed Computing, volume 43. Elsevier, 1997.