Supervisor: Prof. C.L. WANG
We have a few PhD (or 4-year 直博生) positions open for self-motivated and
academically strong students this year. If
you are interested in one of the projects, please contact me at
clwang@cs.hku.hk. Interview will be
arranged for qualified students.
- 漫谈面向大数据,云计算平台建设的新视角与新技术 by C.L. Wang (05/10/2014):
PPT
|
|
|
|
1 |
Japonica:
Java with Auto-Parallelization ON GraphIcs Coprocessing
Architecture (RGC funded project: 11/2016-10/2019) |
|
|
|
|
|
GPUs open up new opportunities for accelerating the Java
programs for high-speed big data analytics. In this new
project, we will develop a portable Java library and runtime
environment "Japonica+" for supporting GPU acceleration of
auto-parallelized loops with non-deterministic data
dependencies. The runtime can support parallelization of a
sequential Java program (with non-deterministic data
dependencies) into parallel workloads (either Java threads or
OpenCL x86 kernels) to run on CPU and OpenCL kernels to run on
GPU concurrently, utilizing all the CPU and GPU computing
resources. Task to be done: (1) automatic translation
from Java bytecode to OpenCL, (2)
auto-parallelization of loops with non-deterministic data
dependencies (See our GPU-TLS paper), (3) dynamic load scheduling and
rebalancing via task migration between CPU and GPU,
(4) virtual shared memory support between host and
multi-GPU cards. The whole project will be developed in recent
Nvidia K40 and Pascal-based GPU cards.
|
|
|
|
|
|
Ph.D (or 4-year 直博生):
Solid backgroudnd in compiler techniques (e.g., loop
parallelization, dependency checking), GPU hardware
architecture (Nvidia or AMD GPUs), and good experiences in GPU
programming (CUDA or OpenCL). 1-2 RAs:
can apply any time if you have the
above experiences (especially OpenCL compilation techniques) |
|
|
Reference:
- OpenCL
https://www.khronos.org/opencl/,
https://developer.nvidia.com/opencl (Nvidia)
- Nvidia Tesla K40 GPU :
http://www.nvidia.com/object/tesla-servers.html
- HKU GPU Cluster: 12 x IBM iDataPlex dx360 M3 server
connected by a QDR IB switch, each has one Nvidia M2050
GPU. (See :
http://i.cs.hku.hk/~clwang/Gideon-II/)
- Previous Japonica
(Java with Auto-Parallelization ON GraphIcs
Co-processing Architecture) project:
http://i.cs.hku.hk/~clwang/projects/Japonica.html
- Hongyuan Liu, King Tin Lam, Huanxin Lin, Cho-Li
Wang, “Lightweight Dependency Checking for Parallelizing
Loops with Non-Deterministic Dependency on GPU", to appear
in ICPADS2016.
- Huanxin Lin, Hongyuan Liu, Cho-Li Wang,
“On-the-Spot Branch Divergence Reduction Using On-GPU
Thread-Data Remapping”, submitted to ASPLOS ’17.
- Guodong Han, Chenggang Zhang, King Tin Lam, and Cho-Li Wang,
Java with Auto-Parallelization on Graphics Coprocessing Architecture, 42nd International Conference on Parallel Processing (ICPP2013), October 1-4, 2013, Lyon, Lyon, France.
(pdf)
- Chenggang Zhang, Guodong Han, Cho-Li Wang,
GPU-TLS:
An Efficient Runtime for Speculative Loop Parallelization on GPUs, 13th
IEEE/ACM International Symposium on Cluster, Cloud and Grid Computing (CCGrid),
May 13--16, 2013, Delft, The Netherlands. (pdf)
- Previous projects on Distributed Java Virtual Machines:
http://www.cs.hku.hk/~clwang/projects/JESSICA2.html
-
TrC-MC: Software Transactional memory on multi-core.
-
Kinson Chan, King Tin Lam, Cho-Li Wang,
Cache-efficient adaptive concurrency control
techniques for software transactional memory on
multi-CMP systems, submitted to Concurrency
Computat.: Pract. Exper. 2015;
-
K. Chan, K. T. Lam, and C.-L. Wang, Cache Affinity
Optimization Techniques for Scaling Software
Transactional Memory Systems on Multi-CMP
Architectures, The 14th International Symposium on
Parallel and Distributed Computing, June 29-July 1,
2015, Limassol, Cyprus.
|
|
|
|
|
2 |
Software Architecture for Fault-Tolerant Multicore Computing
with Hybridized Non-Volatile Memories (使用混合非易失性內存的多核容錯系統軟件:
HK GRF: 9/2015-8/2018) |
|
|
|
|
|
In this project, we
propose a new multicore architecture with a two-level memory
hierarchy (on-chip and off-chip) containing both non-volatile
SST-RAM and volatile SRAM/DRAM. We will investigate the
challenges to the design of system software architectures and
the associated programming model for reliable big data
computing using such hybridized memory hardware. Specifically,
we hope to modify the Linux kernel to build native NVM
management for use by the upper level, and develop a
data-centric fault-tolerant software system for MapReduce-like
programming in a reliable manner.
Reference:
Project Webpage
Recent progress:
- Mingzhe Zhang, King Tin Lam and C.L. Wang, NVMice: A
Non-Volatile Memory-based Instant Checkpointing
Environment (in preparation)
Ph.D (or 4-year 直博生): strong
background in OS kernel, background in fault-tolerance
protocol. RA: needed
|
|
|
|
|
3 |
Crocodiles :
Scalable Cloud-on-Chip
Runtime Support with Software Coherence for Future 1000-Core
Tiled Architectures, HKU 716712E,
9/2012-8/2015, supported by HK RGC. |
|
|
|
|
|
Moving up to a parallelism with
1,000 cores requires a fairly radical rethinking of how to
design system software. With a growing number of cores,
providing hardware-level cache coherence gets increasingly
complicated and costly, leading researchers to promote
abandoning it if future many-core architectures are to stay
inherently scalable. That means software now has to take on
the role in ensuring data coherence among cores. In this research,
we address the above issues and propose novel methodologies to
build a scalable CoC ("Cloud on Chip") runtime platform, dubbed
Crocodiles (Cloud
Runtime with Object Coherence On Dynamic tILES), for
future 1000-core tiled processors.
Crocodiles
involves the development of two important software
subsystems:
(1) Cache coherency protocol (2)
DVFS-based power management. (Currently, 3 Ph.D students
are working on this project.) |
|
|
|
|
|
Ph.D (or 4-year 直博生):
strong background in OS kernel, full knowledge
in memory subsystem (cache/DRAM, paging), cache coherent
protocols.
Require strong background in
software distributed shared memory systems (e.g.,
TreadMarks, JiaJia,
JUMP),
programming experiences in multicore power management systems. |
|
|
Publication and recent effort:
-
Z. Lai, K. T. Lam, C.-L. Wang, and J.
Su, "Latency-aware DVFS for Efficient Power State
Transitions on Many-core Architectures," Journal of
Supercomputing, Vol. 71, No. 7, pp 2720-2747, July 2015.
- Z. Lai, K. T. Lam, C.-L. Wang, and J. Su, "PoweRock:
Power Modeling and Flexible Dynamic Power Management for
Many-core Architectures," IEEE Systems Journal, Issue:
99, pp. 1-13, 20 January 2016.
- Z. Lai, K. T. Lam, C.-L. Wang, and
J. Su, Power and Performance Analysis of the Graph
500 Benchmark on the Single-chip Cloud Computer,
International Conference on Cloud Computing and Internet
of Things (CCIOT '14), Changchun, p. 9-13, China; 12/2014
- Z. Lai, K. T. Lam, C.-L. Wang, and
J. Su, A Power Modeling Approach for Many-Core
Architectures, 10th International Conference on Semantics,
Knowledge and Grids (SKG '14), Beijing; pp. 128 – 132,
27-29 August 2014.
-
“Rhymes: A Shared Virtual
Memory System for Non-Coherent Tiled Many-Core Architectures,”
ICPADS2014.
- “Latency-aware Dynamic
Voltage and Frequency Scaling on Many-core Architecture for
Data-intensive Applications”,
CloudCom-Asia 2013, Fuzhou, China, Dec. 16-18, 2013.
|
|
|
|
|
4 |
OS-1K: New
Operating System for Manycore Systems
(“NoHype” Cloud Operating System
--
马其顿方阵解耦操作系统) |
|
|
|
|
|
Traditional operating systems are based on the
sequential execution model developed in the 1960s. Such
operating systems cannot address new many-core parallel
hardware architecture without major redevelopment. For
instance, how can you harness the power a next-generation
manycore processor with >1,000 cores? We will investigate
various perspectives on the future OS design towards the goal.
We are developping an x86-based full-system simulator based on
Gem5. (Currently, one M.Sc student is working
on this project. We are modifying Gem5 simulator to simulate
the 1000-core chip)
Ph.D (or 4-year
直博生): must have some experiences in OS kernel development.
Some experiences in Barrelfish or sccLinux will be quite
helpful. The student is also required to have good knowledge in multicore
hardware architecture.
|
|
|
|
Updated: May 08, 2015
|
|