[ Sales: 102 (1996) / 160 (1997) / 47 (1998) / 32 (1999) / 6 (2000) / 5 (2001) / ...]
Load Balancing in Parallel Computers: Theory and Practice is about the essential software technique of load balancing in distributed memory message-passing parallel computers, also called multicomputers. Each processor has its own address space and has to communicate with other processors by message passing. In general, a direct, point-to-point interconnection network is used for the communications. Many commercial parallel computers are of this class, including the Intel Paragon, the Thinking Machine CM-5, and the IBM SP2.
Load Balancing in Parallel Computers: Theory and Practice presents a comprehensive treatment of the subject using rigorous mathematical analyses and practical implementations. The focus is on nearest-neighbor load balancing methods in which every processor at every step is restricted to balancing its workload with its direct neighbours only. Nearest-neighbor methods are iterative in nature because a global balanced state can be reached through processors' successive local operations. Since nearest-neighbor methods have a relatively relaxed requirement for the spread of local load information across the system, they are flexible in terms of allowing one to control the balancing quality, effective for preserving communication locality, and can be easily scaled in parallel computers with a direct communication network.
Load Balancing in Parallel Computers: Theory and Practice serves as an
excellent reference source and may be
used as a text for advanced courses on the subject.
The raw power of computers has kept on increasing by leaps and bounds, but human's ability to harness that power does not seem to be keeping up. We perhaps are too accustomed to solving problems sequentially, especially when using the computer. The gap must be bridged by advanced software. A huge amount of effort has been devoted by researchers worldwide to the development of software techniques for parallel computing. These researchers all share the common goal of making the use of parallel computers much less formidable and enabling the user to fully exploit the power of the parallel computer. One such essential software technique is load balancing, which is the subject of this book. Load balancing aims at improving the performance of parallel computers by equalizing the workloads of processors automatically during the execution of parallel programs.
This book is about load balancing in distributed memory message-passing parallel computers, also called multicomputers. Each processor has its own address space and has to communicate with other processors by message passing. In general, a direct, point-to-point interconnection network is used for the communications. Many commercial parallel computers are of this class, including Intel Paragon, TMC CM-5, and IBM SP2. This book presents a comprehensive treatment of the subject using rigorous mathematical analyses and practical implementations. Focus is on nearest-neighbor load balancing methods in which every processor at every step is restricted to balancing its workload with its direct neighbors only. Nearest-neighbor methods are iterative in nature because a global balanced state could be reached through processors' successive local operations. Since nearest-neighbor methods have a relatively relaxed requirement on the spread of local load information around the system, they are flexible in terms of allowing one to control the balancing quality, effective for preserving communication locality, and can be easily scaled in parallel computers with a direct communication network.
In the design and analysis of nearest-neighbor load balancing algorithms, two most important performance metrics are stability and efficiency. Stability measures the ability of the algorithm to coerce any initial workload distribution into a global balanced state in the static workload model and the ability to bound the variance of processors' workload in the dynamic workload model. Efficiency measures the time cost for arriving at the global balanced state or for reducing the variance to a certain level. The objective of this work is to try to design nearest-neighbor algorithms that have good stability and efficiency characteristics.
Two of the most well-known nearest-neighbor load balancing algorithms are the dimension exchange and the diffusion methods. With the dimension exchange method, a processor goes around the table, balancing workload with its nearest neighbors one at a time. With the diffusion method, a processor communicates simultaneously with all its nearest neighbors in order to reach a local balance. These two methods are rigorously analyzed in this book, resulting in optimal tunings of the methods for a number of popular interconnection networks. On the practical side, these two methods are implemented on multicomputers with different characteristics and evaluated in applications with different behaviors. The methods are shown to effective and efficient.
For other popular networks, the ring, the chain, the mesh, the torus and the k-ary n-cube, we derive the optimal exchange parameters in closed form and establish several important relationships between the efficiencies of these structures using circulant matrix theory. Based on these relationships, we conclude that the dimension exchange method favors high dimensional networks.
With the diffusion method, a processor balances its workload with those of its nearest neighbors all at the same time rather than one by one as in the dimension exchange method. Its efficiency is dependent on a diffusion parameter, which characterizes the behavior of a local balance operation. We analyze the diffusion method using circulant matrix theory and derive the optimal values for the diffusion parameter for the k-ary n-cube and its variants. Through statistical simulation, we show significant improvements due to the optimal exchange and the diffusion parameters.
Furthermore, we analyze the dimension exchange and the diffusion method in different workload models and system characteristics. We show that the optimally-tuned dimension exchange algorithm outperforms the diffusion method in both one-port and all-port communication models in achieving a global balanced state. The strength of the diffusion method is in load sharing (i.e., keeping all processors busy but not necessarily balancing their loads) in the all-port communication model.
The last application is parallel combinatorial optimizations. We experiment with the dimension exchange and the diffusion methods for distributing dynamically generated workloads at run-time. Their performance is evaluated in the solution of set partitioning problems on two distributed memory parallel computers. It is found that both methods lead to an almost linear speedup in a system with 32 processors and a speedup of 146.8 in a system with 256 processors. These two methods give the best results among all the methods we tried.
The authors are very grateful to Burkhard Monien, Ralf Diekmann, Reinhard Lüling and Stefan Tschoeke for their valuable comments and contributions to both the theoretical and the experimental aspects of this research. Thanks also go to Erich Koester and other associates of AG-Monien group for their considerate arrangements in both academic and non-academic affairs while the first author was in Germany.
Many other people have contributed to this book. We thank Professor Kai Hwang for the foreword, and Professors Dimitri P. Bertsekas, Tony Chan, Vipin Chaudhary, Henry Cheung, Francis Chin, Andrew Choi, Georege Cybenko, F. Meyer auf der Heide, David Nassimi, and Loren Schwiebert for their valuable inputs at various stages of this project. Special thanks go to our families who had suffered through many long nights of being neglected. Their love, patience and support had meant a lot to us. To Jiwen who had assisted us wholeheartedly from beginning to end, we are greatly indebted.