Department of Computer Science The University of Hong Kong (HKU) Room 421, Chow Yei Ching Building, HKU, Pok Fu Lam Road, Hong Kong [ Resume | Publication | Research Lab | Admitting New Students ] |
I am an associate professor in HKU CS. I got my bachelor and master degrees from Tsinghua University; I joined HKU in January 2015 right after I got my PhD degree from Columbia University (my PhD supervisor is Prof. Junfeng Yang). I lead the HKU Systems Software Lab. I am interested in building parallel and distributed systems, including blockchain systems, distributed AI training/serving systems, distributed big-data and parallel computing systems, and cloud computing systems. I have a particular focus on improving the reliability, security, and performance of these systems. I publish papers in broad areas of systems, security, and networking, including SOSP, NSDI, ASPLOS, ATC, EuroSys, DSN, TPDS, and TDSC. In recent three years, I serve on the program committees of international systems/networking conferences, including OSDI, NSDI, ATC, DSN, EuroSys, SOCC, and ICDCS. I also serve as constant reviewers for international systems/networking/software/security journals, including TPDS, TOCS, TSE, TON, TMC, and TDSC. I receive several world-wide competative research awards, including a Croucher Innovation Award in 2016 (HK $5 million), an outstanding (best) paper award from ACSAC '17, two Huawei flagship research grants in 2018 (blockchain and security) and 2021 (AI), and the Best Collaborating Scientist Medal from the Huawei Theory Lab in 2021, and the RGC Research Impact Fund (RIF) in 2023 (HK $4.3 million). He is the project leader of a National Key R&D Program of China 2030 (the project topic is about building new parallel training systems for large AI models). As (one of) the project leaders or principle investigators, Dr. Cui's total amount of competative research grants in Hong Kong and mainland China has reached about HK $90 million.
My recent research papers have led to commercial software releases with global leading IT industries. For instance, My secure system papers (e.g., [Uranus AsiaCCS 2020] and [DAENet TDSC 2021]) on Trusted Execution Environments have become a core component of Huawei's Trusted and Intelligent Cloud Services (see the UTEE component in TICS). In addition, my students and I are actively collaborating with industries to jointly publish research papers and to transfer the resultant systems from these papers into commercial software of broad areas, including distributed AI training systems, permissioned blockchain systems, security and privacy preserving systems, and geo distributed transaction systems.
I admit several PhD students every year. I expect my students to have good skills/experience on hacking systems software (e.g., Linux kernel, LLVM, or distributed protocols) or AI frameworks, and have strong motivation on research. If you are interested, please directly apply here and select "systems and networking research" as your interested field during the application. If you also want to talk with me individually, please read my recent papers (at least several times for each paper), understand how they work deeply, compile and run them, and then email me what new research topics you can think of (e.g., new applications or significant improvments of my systems, or some other relevant and crazy ideas). I will reply your email quickly if your ideas make sense. I also recruit postdoc of broad systems and networking areas. Please read my papers, form a few short research proposals (ideas/plans) within the intersections of your work and my work, and send me your CV with the proposals.
I have several well funded research grants that can support student Research Assistants (RAs) and summer research interns for students around the world. If you are interested and you can work full-time in HKU for a few months, you can send me emails with your CV and thoughts on my papers.
BIDL: A High-throughput, Low-latency Permissioned Blockchain Framework for Datacenter Networks
[pdf |
video | code]
Proceedings of the 28th ACM Symposium on Operating Systems Principles 2021 (SOSP '21). ACM results reproduced badge.
Fold3D: Rethinking and Parallelizing Computational and Communicational Tasks in the Training of Large DNN Models
[pdf |
slides |
code]
IEEE Transactions on Parallel and Distributed Systems 2021 (TPDS '23)
JITfuzz: Coverage-guided Fuzzing for JVM Just-in-Time Compilers
Proceedings of the 45th International Conference on Software Engineering (ICSE '23)
CRONUS: Fault-isolated, Secure and High-performance Heterogeneous Computing for Trusted Execution Environments
[pdf | slides | video | code]
Jianyu Jiang, Qi Ji, Tianxiang Shen, Xusheng Chen, Shixiong Zhao, Sen Wang, Li Chen, Gong Zhang, Xiapu Luo,
Heming Cui*
Proceedings of the 55th ACM/IEEE International Symposium on Microarchitecture (MICRO '22). ACM results reproduced badge.
ROG: A High Performance and Robust Distributed Training System for Robotic IoT
[pdf | slides | video | code]
Xiuxian Guan, Zekai Sun, Shengliang Deng, Xusheng Chen, Shixiong Zhao*, Zongyuan Zhang, Tianyang Duan, Yuexian Wang, Chenshu Wu, Yong Cui, Libo Zhang, Yanjun Wu, Rui Wang, Heming Cui
Proceedings of the 55th ACM/IEEE International Symposium on Microarchitecture (MICRO '22). ACM results reproduced badge.
A Geography-Based P2P Overlay Network for Fast and Robust Blockchain Systems
[paper]
Haoran Qiu, Tao Ji, Shixiong Zhao*, Xusheng Chen*, Ji Qi,
Heming Cui, Sen Wang
IEEE Transactions on Services Computing 2022 (TSC '22)
SOTER: Guarding Black-box Inference for General Neural Networks at the Edge
[pdf | slides | code]
Proceedings of the 2022 USENIX Annual Technical Conference (ATC '22)
NASPipe: High Performance and Reproducible Pipeline Parallel Supernet Training via Causal Synchronous Parallel
[pdf |
video |
code]
The 2022 Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS '22). ACM results reproduced badge.
vPipe: A Virtualized Acceleration System for Achieving Efficient and Scalable Pipeline Parallel DNN Training
[pdf | code]
IEEE Transactions on Parallel and Distributed Systems 2021 (TPDS '21)
COORP: Satisfying Low-Latency and High-Throughput Requirements of Wireless Network for Coordinated Robotic Learning
[pdf | code]
IEEE Internet of Things Journal 2022 (IOT-J '22)
Evaluating and Improving Neural Program-Smoothing-based Fuzzing
[pdf]
Proceedings of the 44th International Conference on Software Engineering (ICSE '22)
One Fuzzing Strategy to Rule Them All
[pdf]
Proceedings of the 44th International Conference on Software Engineering (ICSE '22)
Achieving Low Tail-latency and High Scalability for Serializable Transactions in Edge Computing
[pdf |
video | code]
Proceedings of the European Conference on Computer Systems 2021 (EuroSys '21). ACM results reproduced badge.
Efficient and DoS-resistant Consensus for Permissioned Blockchains
[pdf | code]
Proceedings of the 39th International Symposium on Computer Performance, Modeling, Measurements and Evaluation 2021 (Performance '21)
Securing Big Data Scientific Workflows via Trusted Heterogeneous Environments
[paper]
IEEE Transactions on Dependable and Secure Computing 2022 (TDSC '22)
DAENet: Making Strong Anonymity Scale in a Fully Decentralized Network
[paper | code]
IEEE Transactions on Dependable and Secure Computing 2021 (TDSC '21)
vSMT-IO: Improving I/O Performance and Efficiency on SMT Processors in Virtualized Clouds
[link]
Proceedings of the 2020 USENIX Annual Technical Conference (ATC '20)
HAMS: High Availability for Distributed Machine Learning Service Graphs
[pdf |
video | code]
Proceedings of the 50th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN '20)
UPA: An Automated, Accurate and Efficient Differentially Private Big-data Mining System
[pdf |
video | code]
Proceedings of the 50th IEEE/IFIP International Conference on Dependable Systems and Networks (DSN '20)
Uranus: Simple, Efficient SGX Programming and Its Applications
[pdf | code]
Proceedings of the 15th ACM ASIA Conference on Computer and Communications Security (ASIACCS '20)
Fulva: Efficient Live Migration for In-memory Key-Value Stores with Zero Downtime
[pdf]
Proceedings of the 38th International Symposium on Reliable Distributed Systems (SRDS '19)
NFVactor: A Resilient NFV System using the Distributed Actor Model
[pdf]
IEEE Journal on Selected Areas in Communications (JSAC) 2019
Effectively Mitigating I/O Inactivity in vCPU Scheduling
[pdf |
code]
Proceedings of the 2018 USENIX Annual Technical Conference (ATC '18)
PLOVER: Fast, Multi-core Scalable Virtual Machine Fault-tolerance
[pdf |
video |
code]
Proceedings of the 15th USENIX Symposium on Networked Systems Design and Implementation 2018 (NSDI '18)
OWL: Understanding and Detecting Concurrency Attacks
[pdf | code]
Proceedings of The 48th IEEE/IFIP International Conference on Dependable Systems and Networks 2018 (DSN '18)
How Local Information Improves Rendezvous in Cognitive Radio Networks
[pdf]
Proceedings of the IEEE International Conference on Sensing, Communication and Networking 2018 (SECON '18)
APUS: Fast and Scalable PAXOS on RDMA
[pdf |
video |
code]
Proceedings of the ACM Symposium on Cloud Computing (SOCC '17), 2017
Kakute: A Precise, Unified Information Flow Analysis System for Big-data Security
[pdf |
video |
code]
Proceedings of the Annual Computer Security Applications Conference (ACSAC '17), 2017. Best paper award!
Confluence: Speeding Up Iterative Distributed Operations by Key-dependency-aware Partitioning
IEEE Transactions on Parallel and Distributed Systems 2017 (TPDS '17)
Paxos Made Transparent
[pdf | code]
Proceedings of the 25th ACM Symposium on Operating Systems Principles (SOSP '15), 2015
Parrot: a Practical Runtime for Deterministic, Stable, and Reliable Threads
[pdf |
slides |
video |
code]
Proceedings of the 24th ACM Symposium on Operating Systems Principles (SOSP '13), 2013
Determinism Is Not Enough: Making Parallel Programs Reliable with Stable Multithreading
Communications of the ACM (2014)
Verifying Systems Rules Using Rule-Directed Symbolic Execution
[pdf |
slides]
Eighteenth International Conference on Architecture Support for Programming Languages and Operating Systems (ASPLOS '13), 2013
Sound and Precise Analysis of Parallel Programs through Schedule Specialization
[pdf]
Proceedings of the 33rd ACM SIGPLAN Conference on Programming Language Design and Implementation (PLDI '12), 2012
Efficient Deterministic Multithreading through Schedule Relaxation
[pdf |
slides]
Proceedings of 23rd ACM Symposium on Operating Systems Principles (SOSP '11), 2011
Stable Deterministic Multithreading through Schedule Memoization
[pdf | slides]
Proceedings of the Ninth Symposium on Operating Systems Design and Implementation (OSDI '10), 2010
Bypassing Races in Live Applications with Execution Filters
[pdf]
Proceedings of the Ninth Symposium on Operating Systems Design and Implementation (OSDI '10), 2010