The HKU Gideon-II Clusters |
|
Gideon 300 Retired (2002-2009)
|
The Department of Computer Science (CS) and Computer
Centre (CC) are jointly establishing a grid computing
system for supporting systems research and
compute-intensive scientific applications in various
disciplines in HKU.
Currently, there are a total 240
servers, carrying a total computing rate 19.43 Tflops
(Phase I) and is expected to grow to 25+ TFlops upon
Phase 2’s completion in early 2010.
|
Cluster Configuration |
|
The Gideon-II
cluster (managed by CS) consists of 48 1U IB nodes, 64
blades (in 4 full chassis), and 12 Fermi-based GPU nodes.
The total peak performance is 16.8 Tflops. The 48 1U IB nodes are connected by
a 48-port DDR InfiniBand switch, while the 64 blades are
connected by a 132-port Gigabit Ethernet Switch. All the IB
nodes have a 2nd NIC to connect to the 132-port
Gigabit Ethernet Switch. The GPU-cluster was installed in
late 2010, which consists of 12 IBM iDataPlex dx360 M3
servers connected by a QDR IB switch. Each noded has one
Nvidia M2050 GPU.
Currently, the three clusters are connected to other HPC centers in
Mainland China to form a nation-wide grid computing
infrastructure -- China National Grid (CNGrid). (Details)
The MDRP cluster (managed by HKU Computer
Center) consists of 128 blades (put at 8 chassis, two of
them with
built-in InfiniBand switch, the rest connected by
internal
Gigabit Ethernet switch).
Inter-chassis connection is through a 24-port 10GbE switch.
Each blade has a theoretical computation rate of 80.96 Gflop.
The new cluster will
serve as the main platform for the cutting-edge
compute-intensive research in quantum chemistry,
computational physics, computational fluid dynamics,
financial engineering, environmental studies, global
hydrology and nanometer studies.
|
Node configuration
Each node has 2 x Intel Nehalem-based Quad-core
Xeon 2.53GHz CPUs + 32 GB 1066MHz DDR3 RAM and SAS
disks/RAID-1
|
OS:
Scientific Linux 5.3 64-bit on all MDRP machines
Fedora 11 64-bit on all SRG machines, except the IB cluster.
|
File/backup servers:
The file/backup servers are with SAS disks/RAID-5
|
Management Networking:
A hierarchical network was formed
based on two 24-port 10GbE Switches for inter-chassis
connection, and support connection to file servers and
storage servers.
|
Single Linux Image
Management (SLIM)
Host OS’es of compute nodes
disseminated on-demand from a central SLIM server.
BIOS setting:
All nodes in CS clusters, enable the followings in the BIOS:
(i) network booting (PXE in particular) for all GbE
interfaces; and (ii) the Intel VT feature.
|
GPU-cluster:
12 x IBM
iDataPlex dx360 M3 server connected by a QDR IB switch, each
has one Nvidia M2050 GPU.
|
Host node
configuration:
- 2 x Intel 6-core Xeon (X5650)
- 48 GB ECC DDR3 RAM
- 250GB 7200 RPM 3.5" HDD
Network:
IB switch:
Qlogic 12300 18-port InfiniBand 4X QDR switch |
On-going Research Projects
-
Japonica:
Transparent Runtime and Memory Coherence Support for GPU Based
Heterogeneous Many-Core Architecture (11/2011-10/2013)
-
JESSICA4: a distributed Java virtual machine with
transparent thread migration and large object space
support on multicore clusters.
-
SLIM-VM:
fast disseminate the Linux OS to networked machines
across a grid.
- Process Roaming: Lightweight process
migration at OS kernel level
-
MPI Checkpointing: Group-based Checkpoint/Restart
for Large-Scale Message-passing Systems
-
TrC-MC: Adaptive software transactional memory model
for multi-core computing
-
Cloud Computing Projects
|
|
|
|
|
Gideon 300 retired on
July 08, 2009 after 7 years of continuous services to the group. |
More info! |
|
HKU Grid Point
Inauguration Ceremony (Video):
27/08/2010.
|
Prof. Paul KH Tam, HKU Pro-Vice-Chancellor & Vice-President (Research)
|
|
Invited Speakers: Prof. Depei QIAN, Prof. Xuebin
CHI, Prof. Gang CHEN. |
|
|
Performance Benchmark (Linpack/Peak = Efficiency) |
Gideon-II 64-node GbE cluster (via Foundry Switch) |
3.45Tflops / 5.181Tflops = 66%
|
Gideon-II 64-node GbE cluster (via 10GbE switch) |
3.115Tflops / 5.181Tflop = 60%
|
Gideon-II 48-node IB-cluster |
3.275Tflops / 3.886Tflops = 84%
|
12-node GPU-Cluster (with 12 Nvidia M2050 GPUs) |
1.532 Tflops (CPU) + 6.180 Tflops (GPU) =
7.712 Tflops |
MDRP 32-node IB cluster |
2.210Tflops / 2.590Tflops = 85% |
MDRP 96-node Gigabit Ethernet cluster |
5.283Tflops / 7.772Tflops = 67% |
|
|
|
HKU Grid Point Construction: Phase 2
|
MDRP
-
16 X
IBM BladeCenter Server HS22
-
2 x backup server, x3620M3 (RAID-5 disk
array)
-
8 x NFS server, x3620M3 (RAID-6 disk
array)
-
OS: Scientific Linux 5.5 64-bit
SRG: 12-node Fermi-based GPU cluster
The delivery of Phase 2 hardware was scheduled in late August
2010 and the installation work would be started afterward.
|
|