Next: 1.1 Commodity Supercomputing
Up: thesis
Previous: List of Tables
  Contents
1. Introduction
Motivated by the desire to handle larger and complex problems, as
well as to solve problems faster, we make use of multiple processing
units for parallel computation. Developments of parallel computers,
especially focusing on the architectural aspect, have been an active
research subject since early 1960. During these forty years of developments,
numerous parallel architectures have been developed, evolved and faded
out, for examples, systolic architecture, dataflow architecture, and
transputer system. Not until recent decades, due to the swiftly improvement
of the VLSI technology, there is a clear convergence of parallel machines
toward a generic parallel machine organization [32]. In this
generic architecture, parallel machines are essentially comprised
of a collection of complete computers, each with one or more processors
and memory, and are interconnected by a communication network.
Advances in networking technology have accelerated this convergence.
We are now capable to transform a pool of off-the-shelf computers
to a powerful platform for supporting high-performance computations.
This kind of computing platform is commonly known as Cluster of Workstations
(COW), Network of Workstations (NOW), or simply Clusters [80].
By the name ``commodity cluster'', we refer to the clusters that
are built on off-the-shelf components, such as high-performance microprocessors
and high-speed networks. Better price-per-performance is the incentive
of building clusters when compared to traditional parallel machines
since they are built on commodity components. In addition, the performance
of the cluster systems is getting along with the advances in commodity
hardware.
These features are the selling points for using clusters on high-performance
computing; however, just putting all state-of-the-art components together
does not guarantee to be cost-effective and high-performance. The
real challenge is how well can we harness these computing resources
to meet our performance needs. As we are building clusters for high-performance
computing, we have to face with challenges related to the performance
issues on the cluster domain. In particular,
- Software design for performance - Developing application programs
for performance is known to be difficult on traditional parallel machines,
we reckon that it is still a big challenge on the cluster domain,
as cluster is just another type of parallel platform.
- System design for performance - Building clusters with the most advanced
processors and networks provides the most promising performance, however,
the actual performance delivered to the end-users most often could
not match with their promises. How to exploit these technology resources
in the most efficient and cost-effective way imposes great challenges
to the system designers.
- Performance scalability - On the architectural aspect, clusters can
be easily grown in size, especially on incremental development that
often matches with yearly financial plans [7]. This
creates another challenge in building clusters for commodity supercomputing
as how can we predict the performance of the scaled cluster with information
only available on a small-scale prototype.
- Performance evaluation and analysis - Performance evaluation and analysis
are the core activities in the software development cycle. A major
objective of practical performance evaluation and analysis is to identify
potential bottleneck(s) on the target application/platform pair. Based
on these analyses, inefficient algorithms are excluded, good candidates
are coded, debugged, and tested on the target platform. However, one
of the difficulties is how to support the performance evaluation and
analysis processes on the cluster systems so as to avoid the high
cost of iterative development cycle - design, coding, debugging and
testing.
We believe that, to achieve effective parallel programming on the
cluster platform, this requires the ability to measure the performance
of the parallel applications, the ability to determine the performance
capability of the cluster systems, and the ability to explain the
performance behavior of a parallel application on a cluster system.
This demands system designers and programmers to have in-depth understanding
of the interactions between various hardware and software components.
In this thesis, we base on a realistic communication model to guide
our understanding and structure our reasoning, as well as to perform
performance tuning. This model is used as a versatile tool for performance
evaluation and predication, as well as for algorithm design and analysis.
The rest of this chapter is organized as follows. We first state our
motivation of building commodity cluster, and describe the limitations
and challenges we have to tackle in order to achieve our goal. Next,
we declare the thesis statement and highlight the contributions of
this thesis. Lastly, we present an outline of the organization of
this thesis.
Subsections
Next: 1.1 Commodity Supercomputing
Up: thesis
Previous: List of Tables
  Contents