HKU Gideon-II Cluster

We aim at solving REAL problems on REAL systems.

Main Menu

People

Cloud Computing

China National Grid (CNGrid)

Other Activities

The HKU Gideon-II Clusters

Gideon 300 Retired (2002-2009)

The Department of Computer Science (CS) and Computer Centre (CC) are jointly establishing a grid computing system for supporting systems research and compute-intensive scientific applications in various disciplines in HKU. Currently, there are a total 240 servers, carrying a total computing rate 19.43 Tflops (Phase I) and is expected to grow to 25+ TFlops upon Phase 2’s completion in early 2010.

Cluster Configuration

The Gideon-II cluster (managed by CS) consists of 48 1U IB nodes, 64 blades (in 4 full chassis), and 12 Fermi-based GPU nodes. The total peak performance is 16.8 Tflops. The 48 1U IB nodes are connected by a 48-port DDR InfiniBand switch, while the 64 blades are connected by a 132-port Gigabit Ethernet Switch. All the IB nodes have a 2nd NIC to connect to the 132-port Gigabit Ethernet Switch. The GPU-cluster was installed in late 2010, which consists of 12 IBM iDataPlex dx360 M3 servers connected by a QDR IB switch. Each noded has one Nvidia M2050 GPU. Currently, the three clusters are connected to other HPC centers in Mainland China to form a nation-wide grid computing infrastructure -- China National Grid (CNGrid). (Details)

The MDRP cluster (managed by HKU Computer Center) consists of 128 blades (put at 8 chassis, two of them with built-in InfiniBand switch, the rest connected by internal Gigabit Ethernet switch). Inter-chassis connection is through a 24-port 10GbE switch. Each blade has a theoretical computation rate of 80.96 Gflop. The new cluster will serve as the main platform for the cutting-edge compute-intensive research in quantum chemistry, computational physics, computational fluid dynamics, financial engineering, environmental studies, global hydrology and nanometer studies.

Node configuration

Each node has 2 x Intel Nehalem-based Quad-core Xeon 2.53GHz CPUs + 32 GB 1066MHz DDR3 RAM and SAS disks/RAID-1

OS:

Scientific Linux 5.3 64-bit on all MDRP machines

Fedora 11 64-bit on all SRG machines, except the IB cluster.

File/backup servers:
The file/backup servers are with SAS disks/RAID-5

Management Networking:

A hierarchical network was formed based on two 24-port 10GbE Switches for inter-chassis connection, and support connection to file servers and storage servers.

Single Linux Image Management (SLIM)

Host OS’es of compute nodes disseminated on-demand from a central SLIM server.

BIOS setting:

All nodes in CS clusters, enable the followings in the BIOS: (i) network booting (PXE in particular) for all GbE interfaces; and (ii) the Intel VT feature.

GPU-cluster:

12 x IBM iDataPlex dx360 M3 server connected by a QDR IB switch, each has one Nvidia M2050 GPU.

Host node configuration:
- 2 x Intel 6-core Xeon (X5650)
- 48 GB ECC DDR3 RAM
- 250GB 7200 RPM 3.5" HDD

Network:

IB switch: Qlogic 12300 18-port InfiniBand 4X QDR switch

On-going Research Projects

Japonica: Transparent Runtime and Memory Coherence Support for GPU Based Heterogeneous Many-Core Architecture (11/2011-10/2013)
JESSICA4: a distributed Java virtual machine with transparent thread migration and large object space support on multicore clusters.
SLIM-VM: fast disseminate the Linux OS to networked machines across a grid.
Process Roaming: Lightweight process migration at OS kernel level
MPI Checkpointing: Group-based Checkpoint/Restart for Large-Scale Message-passing Systems
TrC-MC: Adaptive software transactional memory model for multi-core computing
Cloud Computing Projects

Gideon 300 retired on July 08, 2009 after 7 years of continuous services to the group.

More info!

HKU Grid Point Inauguration Ceremony (Video): 27/08/2010.

Prof. Paul KH Tam, HKU Pro-Vice-Chancellor & Vice-President (Research)

Invited Speakers: Prof. Depei QIAN, Prof. Xuebin CHI, Prof. Gang CHEN.

Performance Benchmark (Linpack/Peak = Efficiency)

Gideon-II 64-node GbE cluster (via Foundry Switch)

3.45Tflops / 5.181Tflops = 66%

Gideon-II 64-node GbE cluster (via 10GbE switch)

3.115Tflops / 5.181Tflop = 60%

Gideon-II 48-node IB-cluster

3.275Tflops / 3.886Tflops = 84%

12-node GPU-Cluster (with 12 Nvidia M2050 GPUs)

1.532 Tflops (CPU) + 6.180 Tflops (GPU) = 7.712 Tflops

MDRP 32-node IB cluster

2.210Tflops / 2.590Tflops = 85%

MDRP 96-node Gigabit Ethernet cluster

5.283Tflops / 7.772Tflops = 67%

HKU Grid Point Construction: Phase 2

MDRP

16 X IBM BladeCenter Server HS22
2 x backup server, x3620M3 (RAID-5 disk array)
8 x NFS server, x3620M3 (RAID-6 disk array)
OS: Scientific Linux 5.5 64-bit

SRG: 12-node Fermi-based GPU cluster

Host: IBM iDataPlex dx360 M3 server
- 2 x Intel 6-core Xeon Processor (X5650 2.66GHz)
- 48GB ECC DDR3 1333MHz RAM
- 250GB HDD (7200 RPM)
- GPU: NVIDIA Tesla M2050.
- OS: Scientific Linux 5.5 64-bit
- Network : 4X-QDR Infiniband (40 Gb/s)
18-port Qlogic 12300 InfiniBand 4X QDR switch

2 X NFS/Storage servers: IBM X3620M3, each with 6 TB HD

The delivery of Phase 2 hardware was scheduled in late August 2010 and the installation work would be started afterward.

Last Update: Friday, November 25, 2011