A Framework for On-Demand Grid Point Construction - Toward “pervasive
Members: Roy S.C. Ho, K.K. Yin, David C.M. Lee, Daniel H.F. Hung, Cho-Li Wang, Francis C.M. Lau.
(Department of Computer Science, The University of Hong Kong,
The Systems Research Group)
Figure 1: InstantGrid: A Grid Service for On-Demand Grid Point
InstantGrid is a framework for efficient
construction of grid point. This new framework is designed to simplify software
management in grid systems, and is capable to instantly turn any computer into a
grid-ready platform with optimized runtime performance. The execution
environment (EE) is centrally managed in an InstantGrid server, and can be
disseminated to and launched in remote compute nodes upon system boot-ups. The
advanced features also facilitate ad-hoc formation of grid platforms in idle
computers. The framework comprises the following core components.
Centralized and application-centric software
management. All OS images and grid middleware are
stored and managed in a central InstantGrid server. These software components
are grouped into distinct, pre-defined, EE’s; each EE targets at a specific
type of applications. For example, service-oriented distributed applications
and job submission-based HPC rely on two very different EE’s. This model
guarantees well-defined EE’s for (and hence compatibility with) various grid
applications. The centrally managed EE’s are disseminated to the compute nodes
on-demand through the network, according to the application requirements.
Proactive software configuration.
Instead of installing and configuring OS’es and middleware incrementally after
they are disseminated to the compute nodes, all software components in a
specific EE are required to be pre-configured in the InstantGrid server. In
other words, software would not be disseminated to the compute nodes unless
all of them are ready to be executed to form the desired EE. These approaches
shorten the time in composing and switching between EE’s.
Performance optimization techniques.
The centralized management model implies an entire EE (which could be as large
as a few gigabytes) has to be disseminated to the compute nodes on-demand.
While replicating all files is obviously impractical, the existing network
booting approaches which completely rely on the network file system (NFS)
would result in poor runtime performance. We aim to address this problem by
exploiting efficient I/O caching techniques to avoid excessive file transfer.
In addition, the discriminative file sharing mechanisms select the suitable
strategy (e.g., NFS-shared, replication, etc.) according to the usage pattern
of a file, which optimizes both the dissemination and runtime performance.
In-memory execution mode.
We aim to cater for a scenario in which the data/OS stored in the permanent
storage in the compute nodes would not be altered (or even accessed) when an
EE obtained via the network executes, i.e., a complete in-memory operation.
This is especially useful for supporting grid computing in existing cluster
platforms, desktop/home computers, and diskless blade servers.
We developed a reference implementation of
InstantGrid, which manages and disseminates the commodity Linux OS and grid
middleware to construct production grid environments. In our design, the
performance optimization techniques and the in-memory execution support are
integrated into a toolkit called SLIM
(Single Linux Image Management), which forms the low-level support for
disseminating EE’s in InstantGrid. While InstantGrid is specifically designed
for grid point construction, the SLIM component could be used for convenient
software management and system administration 2 in distributed systems. We
conducted experiments with the implemented prototype, the results demonstrate
that a 256- node grid point can be constructed in 5 minutes from scratch. This
grid point was equipped with the Fedora Linux Core 1, Globus Toolkit 3, Portable
Batch System (PBS), and the Ganglia cluster monitoring package.
Execution Flow of InstantGrid
1. Software installation at
2. Client boots and obtains kernel
3. OS image/App disseminated
Process to generate certificates
Roy S.C. Ho, K.K. Yin, David C.M. Lee,
Daniel H.F. Hung, C.L. Wang, and Francis C.M. Lau, ``InstantGrid: A Framework
for On-Demand Grid Point Construction,'' The International Workshop
on Grid and Cooperative Computing (GCC 2004),
Oct 21-24, 2004, Wuhan,
Roy S.C. Ho, David C.M. Lee, Daniel
H.F. Hung, Cho-Li Wang, and Francis C.M. Lau, "On Managing Execution
Environments for Utility Computing,'' Proceedings of Network Research
Workshop 2004, 18th Asia-Pacific Advanced Network Meetings (APAN 2004),
Cairns, Australia, July 6, 2004, 175-182. (pdf)