JESSICA2: A Parallel Java Computing Engine with Thread Migration

JESSICA (Java-Enabled Single-System-Image Computing Architecture) is a distributed Java Virtual Machine (DJVM) that supports parallel execution of multithreaded Java applications in a networked cluster environment. JESSICA makes the whole cluster look as a single machine running a single JVM. With JESSICA2 DJVM, a single Java program can span multiple computing nodes, and enjoy extreme computation power, huge memory space, and strong I/O capabilities.

JESSICA V1.0 (1996-1999)
The first distributed Java virtual machine that supported transparent thread migration Global Heap: built on top of TreadMarks (Lazy Release Consistency + homeless) JVM kernel modification Execution mode: Interpreter Mode (slow)
JESSICA V2.0 (2000-2006)
Built-in Global Heap (Lazy Release Consistency + migrating-home) Execution mode: JIT-Compiler Mode (full speed) HKU JESSICA2 Project Webpage
JESSICA V3.0 (2006-)
Above JVM (Run on SUN JDK) JVMTI-based state extraction Support Large Object Space (Scope Consistency) Speed depends on the underlying JVM (now Java 2 SDK) HKU JESSICA3 Project webpage

JESSICA V4.0 (2009-)

With the emergence of multicore processors, the world of cluster computing has entered a “multicore” era. Such a technological shift calls for a new parallel programming paradigm that strikes the best balance between programmability and scalability. Among relevant research efforts, the partitioned global address space (PGAS) programming model has been gaining momentum to prevail in recent years, which is the basis of Unified Parallel C, Co-array Fortran, Titanium, Fortress (Sun), Chapel (Cray) and X10 (IBM). This model provides ease-of-use by a virtualized shared memory while relying on programmers to maintain shared data locality for high performance. This level of abstraction seems adequate for elite scientific application programmers but certainly not for software developers in the much wider application domains including web and enterprise. In particular, their applications usually require access to shared data in small granularity and need frequent synchronization to ensure thread-safety. With fine-grained sharing, scaling up performance by increasing the number of cores is difficult because of severe cache-level contentions among threads and high-latency communications across cluster nodes.

In this research, we aim at solving the challenging scalability issues of fine-grained distributed object sharing over a multicore cluster by proposing new mechanisms, protocols and adaptive runtime strategies for synchronizing shared object access. In view of Java’s popularity and much faster speed, we believe extending the Java multithreaded programming model to the multicore cluster environment is a potential programming paradigm. However, Java’s purely lock-based synchronization model is prone to scalability bottleneck. While transaction-based synchronization has been proposed to ease exclusive blocking, its commit/rollback semantics may generate more traffics in clusters. We propose a load-time automated code translation approach that replaces the original lock-based Java synchronization codes by a novel synchronization construct called Tweak (Two-way elastic atomic block). Tweak exhibits switchable duality of lock and transaction for making up high concurrency while preserving lock-like semantics to allow lazier memory consistency maintenance in a cluster environment. To best exploit the dual features of tweaks and processor core affinity, we also propose a lightweight profiling framework for tracking transaction abort rates and the working set of shared objects. Such profiling information is useful for devising adaptive optimizations including thread placement to cores, object prefetching, thread concurrency, and lowering conflict detection granularity to eagerly repair doomed conflicting patterns. We will implement the proposed methodologies as a new distributed Java virtual machine prototype and evaluate its performance and the adaptive optimizations’ effectiveness. With such a system, fine-grained parallel applications can be scaled readily based on a generic and scalable programming paradigm.

This project is supported by a RGC grant under the project title "Adaptive Software Support for Fine-Grained Distributed Object Sharing on Multicore Clusters", PI: Dr. Cho-Li Wang, project period: 09/2009-08/2011.

Project Students:

	King Tin LAM, Ph.D (9/2006-), Fine-grained Object Sharing in Distributed Java Virtual Machine		Fangwei (Alan) Li, Ph.D (9/2006-), Huge Object Space Support for Distributed Java Virtual Machine, JESSICA3 Project.
	Kinson Chan (陳傑信), Ph.D (9/2009-), TrC-MC: An Adaptive Software Transactional Memory Support for Multi-Core Programming		Chenggang Zhang (张呈刚), M.Phil (09/2009-08/2011).

Reference

Programming in the Partitioned Global Address Space Model (Tutorial PPT, 2003, at SC03), Bill Carlson, Tarek El-Ghazawi, Robert Numrich, Kathy Yelick.
Vivek Sarkar's X10 slide (VEE2004)
Performance and Productivity Opportunities using Global Address Space Programming Models, Kathy Yelick (2006) (PPT)
The Software Challenges of Petascale Computing (An Interview with Kathy Yelick) -- a related PPT
PGAS Related Conferences:
- PGAS 2009, PGAS 2008, PGAS 2007
- Parallel Architectures and Compilation Techniques (PACT) : PACT2009, PACT2010
- ACM International Conference on Supercomputing : ICS06

Last Modification: Dec. 05, 2009 (by C.L. Wang)