<h3>
Parallel and Distributed Processing and Applications, International
Symposium (ISPA 2003), Aizu, Japan, July 2-4, 2003, and
<br>
The 5th Workshop on High Performance Scientific and Engineering
Computing with Applications (HPSECA-03), in conjunction with ICPP
2003, Kaohsiung, Taiwan, October 6-9, 2003</h3>

<p>
<b>Title:</b>
Towards a Single System Image for High-Performance Java

<p>
<b>Abstract:</b>
Multithreaded programming in Java is an attraction to programmers
writing high-performance code if their programs can exploit the
multiplicity of resources in a cluster environment. Unfortunately, the
common platforms today have not made it possible to allow a single
Java program to span multiple computing nodes, let alone dynamically
to cling on to additional nodes or migrate some of its executing code
from one node to another for load balancing reasons during runtime. As
a result, programmers resort to parallel Java programming using such
devices as MPI. Ideally, a "Single System Image" (SSI) offered by the
cluster is all that is needed. But the purist's idea of SSI is not at
all easy to achieve if indeed it can be achieved. A workable SSI
solution will likely require a major concerted effort by designers on
all fronts, from those of the hardware and OS to those looking after
the upper or middleware layers. In the meantime, partial SSI
implementations offer limited but useful capabilities and help clear
the way for more complete SSI implementations in the future. We
present in this talk our brave attempts to provide partial SSI in a
cluster for the concurrent Java programmers, and discuss how the
design of the Java Virtual Machine (JVM) has made it possible (or has given
us some of the troubles). At the core of our present design are a
thread migration mechanism that works for Java threads compiled in
Just-In-Time (JIT) mode, and an efficient global object space that enables
cross-machine access of Java objects. We close with some thoughts on
what can be done next to popularize our or similar approaches.
The following gives an overview of the system we have
implemented
which will serve as a major guiding example for our discussion.

<p>
SSI can be implemented at different layers such as the hardware,
the OS, or as a middleware. We choose to implement
SSI as a middleware by extending the JVM, resulting is a
"distributed JVM" (DJVM).
Our DJVM supports the scheduling of Java threads on cluster nodes and
provides locality transparency to object accesses and I/O operations.
The semantics of a thread's execution on the DJVM
will be preserved as if it were executed in a single node.

<p>
The DJVM needs to extend
the three main building blocks of the single-node JVM: the
execution engine, the thread scheduler, and the heap.
Similar to the single-node JVM runtime
implementation, the execution engine can be classified into four
types: interpreter-based engine, JIT compiler-based engine,
mixed-mode execution engine, and static compiler-based engine.
There are two modes of thread scheduling that can be found in current
DJVM prototypes: static thread distribution and
dynamic thread migration. The heap in the DJVM needs to provide a
shared memory space for all the Java threads scattered in various
cluster nodes, and there are two main approaches: realizing a heap by
adopting a distributed shared memory system or by extending the heap.

<p>
Our proposed DJVM exploits the power of the
JIT compiler to achieve the best possible performance.
We introduce a new cluster-aware Java
execution engine to support the execution of
distributed Java threads in JIT compiler mode. The results we
obtained show a major improvement in performance over the old
interpreter-based implementation. A dynamic thread migration
mechanism is implemented to support the flexible
distribution of Java threads so that Java threads can be migrated
from one node to another during execution.

<p>
Among the many challenges in realizing a migration mechanism for
Java threads, the transferring of thread contexts between cluster
nodes requires the most careful and meticulous design.
In a JIT-enabled JVM, the JVM stack of a Java thread becomes a native
stack and no longer remains bytecode oriented.
We solve the problem of transformation of native Java thread context
directly inside the JIT compiler.
We extend the heap in the JVM to a Global Object
Space (GOS) that creates an SSI view for all Java threads inside
an application. The GOS support enables location transparent
access not only to data objects but also to I/O objects. As
the GOS is built inside JVM, we exploit the JVM runtime
components such as the JIT compiler and the threaded I/O functions
to minimize the remote object access
overheads in a distributed environment.

<p>
As the execution speed in JIT mode is typically much faster than
that in interpreter mode, the cost to access an
object in a JIT compiler enabled DJVM becomes relatively high,
which in turn puts pressure on the efficiency of the heap
implementation. To reduce the communication overheads, optimizing
caching protocols are used in our design. We employ an "adaptive
object home migration protocol" to address the problem of frequent
write accesses to a remote object, and a timestamp-based fetching
protocol to prevent the redundant fetching of
remote objects, among other optimization tricks.

<p>
<a href=icpp.ppt>PPT</a> (1.3M)