Next: 1.1 Commodity Supercomputing Up: thesis Previous: List of Tables Contents

1. Introduction

Motivated by the desire to handle larger and complex problems, as well as to solve problems faster, we make use of multiple processing units for parallel computation. Developments of parallel computers, especially focusing on the architectural aspect, have been an active research subject since early 1960. During these forty years of developments, numerous parallel architectures have been developed, evolved and faded out, for examples, systolic architecture, dataflow architecture, and transputer system. Not until recent decades, due to the swiftly improvement of the VLSI technology, there is a clear convergence of parallel machines toward a generic parallel machine organization [32]. In this generic architecture, parallel machines are essentially comprised of a collection of complete computers, each with one or more processors and memory, and are interconnected by a communication network.

Advances in networking technology have accelerated this convergence. We are now capable to transform a pool of off-the-shelf computers to a powerful platform for supporting high-performance computations. This kind of computing platform is commonly known as Cluster of Workstations (COW), Network of Workstations (NOW), or simply Clusters [80]. By the name ``commodity cluster'', we refer to the clusters that are built on off-the-shelf components, such as high-performance microprocessors and high-speed networks. Better price-per-performance is the incentive of building clusters when compared to traditional parallel machines since they are built on commodity components. In addition, the performance of the cluster systems is getting along with the advances in commodity hardware.

These features are the selling points for using clusters on high-performance computing; however, just putting all state-of-the-art components together does not guarantee to be cost-effective and high-performance. The real challenge is how well can we harness these computing resources to meet our performance needs. As we are building clusters for high-performance computing, we have to face with challenges related to the performance issues on the cluster domain. In particular,

Software design for performance - Developing application programs for performance is known to be difficult on traditional parallel machines, we reckon that it is still a big challenge on the cluster domain, as cluster is just another type of parallel platform.
System design for performance - Building clusters with the most advanced processors and networks provides the most promising performance, however, the actual performance delivered to the end-users most often could not match with their promises. How to exploit these technology resources in the most efficient and cost-effective way imposes great challenges to the system designers.
Performance scalability - On the architectural aspect, clusters can be easily grown in size, especially on incremental development that often matches with yearly financial plans [7]. This creates another challenge in building clusters for commodity supercomputing as how can we predict the performance of the scaled cluster with information only available on a small-scale prototype.
Performance evaluation and analysis - Performance evaluation and analysis are the core activities in the software development cycle. A major objective of practical performance evaluation and analysis is to identify potential bottleneck(s) on the target application/platform pair. Based on these analyses, inefficient algorithms are excluded, good candidates are coded, debugged, and tested on the target platform. However, one of the difficulties is how to support the performance evaluation and analysis processes on the cluster systems so as to avoid the high cost of iterative development cycle - design, coding, debugging and testing.

We believe that, to achieve effective parallel programming on the cluster platform, this requires the ability to measure the performance of the parallel applications, the ability to determine the performance capability of the cluster systems, and the ability to explain the performance behavior of a parallel application on a cluster system. This demands system designers and programmers to have in-depth understanding of the interactions between various hardware and software components. In this thesis, we base on a realistic communication model to guide our understanding and structure our reasoning, as well as to perform performance tuning. This model is used as a versatile tool for performance evaluation and predication, as well as for algorithm design and analysis.

The rest of this chapter is organized as follows. We first state our motivation of building commodity cluster, and describe the limitations and challenges we have to tackle in order to achieve our goal. Next, we declare the thesis statement and highlight the contributions of this thesis. Lastly, we present an outline of the organization of this thesis.

Subsections

Next: 1.1 Commodity Supercomputing Up: thesis Previous: List of Tables Contents