We aim at solving REAL problems on REAL systems.

Main Menu

People

Cloud Computing

China National Grid (CNGrid)

Other Activities

 
 
 

The HKU Gideon-II Clusters

Gideon 300 Retired (2002-2009)

 

The Department of Computer Science (CS) and Computer Centre (CC) are jointly establishing a grid computing system for supporting systems research and compute-intensive scientific applications in various disciplines in HKU. Currently, there are a total 240 servers, carrying a total computing rate 19.43 Tflops (Phase I) and is expected to grow to 25+ TFlops upon Phase 2s completion in early 2010.

 

Cluster Configuration

The Gideon-II cluster (managed by CS) consists of 48 1U IB nodes, 64 blades (in 4 full chassis), and 12 Fermi-based GPU nodes. The total peak performance is 16.8 Tflops. The 48 1U IB nodes are connected by a 48-port DDR InfiniBand switch, while the 64 blades are connected by a 132-port Gigabit Ethernet Switch. All the IB nodes have a 2nd NIC to connect to the  132-port Gigabit Ethernet Switch. The GPU-cluster was installed in late 2010, which consists of 12 IBM iDataPlex dx360 M3 servers connected by a QDR IB switch. Each noded has one Nvidia M2050 GPU.  Currently, the three clusters are connected to  other HPC centers in Mainland China to form a nation-wide grid computing infrastructure -- China National Grid (CNGrid). (Details)


The MDRP cluster (managed by HKU Computer Center) consists of 128 blades (put at 8 chassis, two of them with
built-in InfiniBand  switch, the rest connected by internal Gigabit Ethernet switch). Inter-chassis connection is through a 24-port 10GbE switch. Each blade has a theoretical computation rate of 80.96 Gflop. The new cluster will serve as the main platform for the cutting-edge compute-intensive research in quantum chemistry, computational physics, computational fluid dynamics, financial engineering, environmental studies, global hydrology and nanometer studies.

Node configuration

Each node has 2 x Intel Nehalem-based Quad-core Xeon 2.53GHz CPUs + 32 GB 1066MHz DDR3 RAM and SAS disks/RAID-1

OS:

Scientific Linux 5.3 64-bit on all MDRP machines

Fedora 11 64-bit on all SRG machines, except the IB cluster.

 

File/backup servers:
The file/backup servers are with SAS disks/RAID-5

Management Networking:

A hierarchical network was formed based on two 24-port 10GbE Switches for inter-chassis connection, and support connection to file servers and storage servers.

Single Linux Image Management (SLIM) 

Host OSes of compute nodes disseminated on-demand from a central SLIM server.

 

BIOS setting:

All nodes in CS clusters, enable the followings in the BIOS: (i) network booting (PXE in particular) for all GbE interfaces; and (ii) the Intel VT feature.

GPU-cluster: 

12 x IBM iDataPlex dx360 M3 server connected by a QDR IB switch, each has one Nvidia M2050 GPU.

Host node configuration:
- 2 x Intel 6-core Xeon (X5650)
- 48 GB ECC DDR3 RAM
- 250GB 7200 RPM 3.5" HDD

 

Network:

IB switch: Qlogic 12300 18-port InfiniBand 4X QDR switch

 

On-going Research Projects

  • Japonica: Transparent Runtime and Memory Coherence Support for GPU Based Heterogeneous Many-Core  Architecture (11/2011-10/2013)
  • JESSICA4: a distributed Java virtual machine with transparent thread migration and large object space support on multicore clusters.
  • SLIM-VM: fast disseminate the Linux OS to networked machines across a grid.
  • Process Roaming: Lightweight process migration at OS kernel level
  • MPI Checkpointing: Group-based Checkpoint/Restart for Large-Scale Message-passing Systems
  • TrC-MC: Adaptive software transactional memory model for multi-core computing
  • Cloud Computing Projects
 

Gideon 300 retired on July 08, 2009 after 7 years of continuous services to the group.


More info!
HKU Grid Point Inauguration Ceremony (Video): 27/08/2010.
Prof. Paul KH Tam, HKU Pro-Vice-Chancellor & Vice-President (Research)
Invited Speakers: Prof. Depei QIAN, Prof. Xuebin CHI, Prof. Gang CHEN.

 
Performance Benchmark (Linpack/Peak = Efficiency)
Gideon-II 64-node GbE cluster (via Foundry Switch) 3.45Tflops / 5.181Tflops = 66%
Gideon-II  64-node GbE cluster (via 10GbE switch) 3.115Tflops / 5.181Tflop = 60%
Gideon-II 48-node IB-cluster 3.275Tflops / 3.886Tflops = 84%
12-node GPU-Cluster (with 12 Nvidia M2050 GPUs) 1.532 Tflops (CPU) + 6.180 Tflops (GPU) =  7.712 Tflops
MDRP 32-node IB cluster 2.210Tflops / 2.590Tflops = 85%
MDRP 96-node Gigabit Ethernet cluster 5.283Tflops / 7.772Tflops = 67%
     

HKU Grid Point Construction: Phase 2 

 

MDRP

  • 16 X IBM BladeCenter Server HS22

  • 2 x backup server, x3620M3 (RAID-5 disk array)

  • 8 x NFS server, x3620M3 (RAID-6 disk array)

  • OS: Scientific Linux 5.5 64-bit

SRG: 12-node Fermi-based GPU cluster

  • Host: IBM iDataPlex dx360 M3 server

    • 2 x Intel 6-core Xeon Processor (X5650 2.66GHz)                          

    • 48GB ECC DDR3 1333MHz RAM                        

    • 250GB HDD  (7200 RPM)

    • GPU: NVIDIA Tesla M2050.

    • OS: Scientific Linux 5.5 64-bit

    • Network : 4X-QDR Infiniband (40 Gb/s)

  • 18-port Qlogic 12300 InfiniBand 4X QDR switch

  • 2 X NFS/Storage servers: IBM X3620M3, each with 6 TB HD

The delivery of Phase 2 hardware was scheduled in late August 2010 and the installation work would be started afterward.

 

Last Update: Friday, November 25, 2011

 

Copyright HKU CS Department