InstantGrid

InstantGrid Project

A Framework for On-Demand Grid Point Construction - Toward “pervasive utility computing”

Members: Roy S.C. Ho, K.K. Yin, David C.M. Lee, Daniel H.F. Hung, Cho-Li Wang, Francis C.M. Lau. (Department of Computer Science, The University of Hong Kong, The Systems Research Group)

Figure 1: InstantGrid: A Grid Service for On-Demand Grid Point Construction

Introduction

InstantGrid is a framework for efficient construction of grid point. This new framework is designed to simplify software management in grid systems, and is capable to instantly turn any computer into a grid-ready platform with optimized runtime performance. The execution environment (EE) is centrally managed in an InstantGrid server, and can be disseminated to and launched in remote compute nodes upon system boot-ups. The advanced features also facilitate ad-hoc formation of grid platforms in idle computers. The framework comprises the following core components.

Centralized and application-centric software management. All OS images and grid middleware are stored and managed in a central InstantGrid server. These software components are grouped into distinct, pre-defined, EE’s; each EE targets at a specific type of applications. For example, service-oriented distributed applications and job submission-based HPC rely on two very different EE’s. This model guarantees well-defined EE’s for (and hence compatibility with) various grid applications. The centrally managed EE’s are disseminated to the compute nodes on-demand through the network, according to the application requirements.
Proactive software configuration. Instead of installing and configuring OS’es and middleware incrementally after they are disseminated to the compute nodes, all software components in a specific EE are required to be pre-configured in the InstantGrid server. In other words, software would not be disseminated to the compute nodes unless all of them are ready to be executed to form the desired EE. These approaches shorten the time in composing and switching between EE’s.
Performance optimization techniques. The centralized management model implies an entire EE (which could be as large as a few gigabytes) has to be disseminated to the compute nodes on-demand. While replicating all files is obviously impractical, the existing network booting approaches which completely rely on the network file system (NFS) would result in poor runtime performance. We aim to address this problem by exploiting efficient I/O caching techniques to avoid excessive file transfer. In addition, the discriminative file sharing mechanisms select the suitable strategy (e.g., NFS-shared, replication, etc.) according to the usage pattern of a file, which optimizes both the dissemination and runtime performance.
In-memory execution mode. We aim to cater for a scenario in which the data/OS stored in the permanent storage in the compute nodes would not be altered (or even accessed) when an EE obtained via the network executes, i.e., a complete in-memory operation. This is especially useful for supporting grid computing in existing cluster platforms, desktop/home computers, and diskless blade servers.

We developed a reference implementation of InstantGrid, which manages and disseminates the commodity Linux OS and grid middleware to construct production grid environments. In our design, the performance optimization techniques and the in-memory execution support are integrated into a toolkit called SLIM (Single Linux Image Management), which forms the low-level support for disseminating EE’s in InstantGrid. While InstantGrid is specifically designed for grid point construction, the SLIM component could be used for convenient software management and system administration 2 in distributed systems. We conducted experiments with the implemented prototype, the results demonstrate that a 256- node grid point can be constructed in 5 minutes from scratch. This grid point was equipped with the Fedora Linux Core 1, Globus Toolkit 3, Portable Batch System (PBS), and the Ganglia cluster monitoring package.

Execution Flow of InstantGrid


1. Software installation at SLIM server	2. Client boots and obtains kernel

3. OS image/App disseminated	4. Process to generate certificates

Publication:

Roy S.C. Ho, K.K. Yin, David C.M. Lee, Daniel H.F. Hung, C.L. Wang, and Francis C.M. Lau, ``InstantGrid: A Framework for On-Demand Grid Point Construction,'' The International Workshop on Grid and Cooperative Computing (GCC 2004), pp. 911-914, Oct 21-24, 2004, Wuhan, China. (pdf)
Roy S.C. Ho, David C.M. Lee, Daniel H.F. Hung, Cho-Li Wang, and Francis C.M. Lau, "On Managing Execution Environments for Utility Computing,'' Proceedings of Network Research Workshop 2004, 18th Asia-Pacific Advanced Network Meetings (APAN 2004), Cairns, Australia, July 6, 2004, 175-182. (pdf)

Related Work

HKU Grid Research Project Main Page
HKU SLIM project: http://slim.cs.hku.hk/
G-JavaMPI (Java-MPI binding for parallel computing in Grid)

Last revised:

Thursday, December 23, 2004