Research Projects - The Systems Research Group

Supervisor: Prof. C.L. WANG

We have a few PhD (or 4-year 直博生) positions open for self-motivated and academically strong students this year. If you are interested in one of the projects, please contact me at clwang@cs.hku.hk. Interview will be arranged for qualified students.

漫谈面向大数据,云计算平台建设的新视角与新技术 by C.L. Wang (05/10/2014): PPT

1	Japonica+: Speculative Java Computing on GPUs (9/2014 --)
	GPUs open up new opportunities for accelerating the Java programs for high-speed big data analytics. In this new project, we will develop a portable Java library and runtime environment "Japonica+" for supporting GPU acceleration of auto-parallelized loops with non-deterministic data dependencies. The runtime can support parallelization of a sequential Java program (with non-deterministic data dependencies) into parallel workloads (either Java threads or OpenCL x86 kernels) to run on CPU and OpenCL kernels to run on GPU concurrently, utilizing all the CPU and GPU computing resources. Task to be done: (1) automatic translation from Java bytecode to OpenCL, (2) auto-parallelization of loops with non-deterministic data dependencies (See our GPU-TLS paper), (3) dynamic load scheduling and rebalancing via task migration between CPU and GPU, (4) virtual shared memory support between host and multi-GPU cards. The whole project will be developed in recent Nvidia K40 GPU cards.
	1-2 Ph.D students (first priority): Solid backgroudnd in compiler techniques (e.g., loop parallelization, dependency checking), GPU hardware architecture (Nvidia or AMD GPUs), and good experiences in GPU programming (CUDA or OpenCL). 1-2 RAs: can apply any time if you have the above experiences (especially OpenCL)
	Reference: OpenCL https://www.khronos.org/opencl/, https://developer.nvidia.com/opencl (Nvidia) Nvidia Tesla K40 GPU : http://www.nvidia.com/object/tesla-servers.html HKU GPU Cluster: 12 x IBM iDataPlex dx360 M3 server connected by a QDR IB switch, each has one Nvidia M2050 GPU. (See : http://i.cs.hku.hk/~clwang/Gideon-II/) Previous Japonica (Java with Auto-Parallelization ON GraphIcs Co-processing Architecture) project: http://i.cs.hku.hk/~clwang/projects/Japonica.html Guodong Han, Chenggang Zhang, King Tin Lam, and Cho-Li Wang, *Java with Auto-Parallelization on Graphics Coprocessing Architecture, 42nd International Conference on Parallel Processing (ICPP2013), October 1-4, 2013, Lyon, Lyon, France. (pdf) Chenggang Zhang, Guodong Han, Cho-Li Wang, GPU-TLS: An Efficient Runtime for Speculative Loop Parallelization on GPUs, 13th IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid), May 13--16, 2013, Delft, The Netherlands. (pdf) Previous projects on Distributed Java Virtual Machines: http://www.cs.hku.hk/~clwang/projects/JESSICA2.html TrC-MC*: Software Transactional memory on multi-core.

2	Crocodiles : Scalable Cloud-on-Chip Runtime Support with Software Coherence for Future 1000-Core Tiled Architectures, HKU 716712E, 9/2012-8/2015, supported by HK RGC.
	Moving up to a parallelism with 1,000 cores requires a fairly radical rethinking of how to design system software. With a growing number of cores, providing hardware-level cache coherence gets increasingly complicated and costly, leading researchers to promote abandoning it if future many-core architectures are to stay inherently scalable. That means software now has to take on the role in ensuring data coherence among cores. In this research, we address the above issues and propose novel methodologies to build a scalable CoC ("Cloud on Chip") runtime platform, dubbed Crocodiles (Cloud Runtime with Object Coherence On Dynamic tILES), for future 1000-core tiled processors. Crocodiles involves the development of two important software subsystems: (1) Cache coherency protocol (2) DVFS-based power management. (Currently, 3 Ph.D students are working on this project.) 1 Ph.D student: strong background in OS kernel, full knowledge in memory subsystem (cache/DRAM, paging), cache coherent protocols. Require strong background in software distributed shared memory systems (e.g., TreadMarks, JiaJia, JUMP), programming experiences in multicore power management systems.
	Reference: “Rhymes: A Shared Virtual Memory System for Non-Coherent Tiled Many-Core Architectures,” to appear in ICPADS2014. “Latency-aware Dynamic Voltage and Frequency Scaling on Many-core Architecture for Data-intensive Applications”, CloudCom-Asia 2013, Fuzhou, China, Dec. 16-18, 2013.


3	Cloud Computing and In-Memory Computing (内存计算)
	Performance Optimization of Apache Spark on SDN-connected Cluster (Now: 2 M.Sc students + one PhD student) One M.Sc student (Mr. Ying Li) is now implementing "Nesox", a network resource scheduler for data-parallel computing by leveraging SDN techniques. 1 Ph.D student: must have strong interest in Cloud, Network Virtualization Technques.
	Reference: Apache Spark: https://spark.apache.org/ Software Defind Network (SDN): Read ONF

4	OS-1K: New Operating System for Manycore Systems (“NoHype” Cloud Operating System (马其顿方阵解耦操作系统))
	Traditional operating systems are based on the sequential execution model developed in the 1960s. Such operating systems cannot address new many-core parallel hardware architecture without major redevelopment. For instance, how can you harness the power a next-generation manycore processor with >1,000 cores? We will investigate various perspectives on the future OS design towards the goal. We are developping an x86-based full-system simulator based on Gem5. (Currently, one Ph.D student is working on this project.) 1 Ph.D student: must have some experiences in OS kernel development. Some experiences in Barrelfish or sccLinux will be quite helpful. The student is also required to have good knowledge in multicore hardware architecture.

Updated: Sept. 06, 2014