Next: 5.2 Non-Blocking Switch Up: 5. Complete Exchange on Previous: 5. Complete Exchange on Contents

5.1 Complete Exchange

Complete exchange, also known as all-to-all personalized communication, is a collective operation takes place with a set of processes, and each process has a distinct set of data to transmit to every other process in the set. To minimize the communication delay, all processes are actively participating in the communication. It is known to be the most stringent communication requirement imposed on the interconnection network. Such a communication pattern occurs in numerous numerical and scientific applications.

Due to its importance, complete exchange operation has been extensively studied in the past. Most of the studies are focused on designing communication schedules to avoid contention delay induced by the topological constraints of the underlying networks, such as hypercubes [10], meshes [94], tori [104], fat-trees [81], multistage interconnection networks [121] and multi-dimension networks [33]. These algorithms exploit the full performance of the underlying networks by carefully scheduling communications to avoid both node and link contention. Thus, these communication schedules are almost shaped to the target network constraints.

General speaking, algorithms for complete exchange can be classified into two categories, the direct or indirect approaches. For the direct algorithm, each process directly sends those data blocks to each of the destination processes using separate communication steps. A clear advantage is that the messages are delivered right to the destinations without going through any intermediate nodes, hence, each message appears only once in the network. This favors networks with higher connectivity such as multistage interconnection networks and the crossbar networks, since the major issue is to schedule the transmissions such that no link contention takes place. For the indirect algorithm, data blocks for a set of destination processes are combined to a larger block and are sent to a representative process, this is then forward to the correct destination processes. The indirect approach reduces the number of communication steps to reduce the startup cost; however, it introduces more traffic in the network and extra software overheads in performing data permutation. Thus, indirect approach favors small size messages exchange, while direct approach favors long messages exchange [17,16,19].

To avoid link and node contention, some complete exchange algorithms split the communication schedule into multiple phases. Each phase corresponds to a contention-free routing of messages between nodes. This approach restrains the parallelism between different phases, as a process would not enter next phase unless it has finished the message exchange of the current phase. Besides, not all processes are active in each phase [57,104], therefore, inactive processes have to be kept idle for the whole phase. Furthermore, to achieve this contention-free synchronism, the schedule would induce substantial synchronization overhead. First, processes ahead of schedule cannot continue, this means some of the links carry no data. Second, it is hard to enforce this synchronism on a distributed system. In particular, synchronization achieved by software solutions (e.g., barrier) could contend with normal data transfer and this may become a waste of bandwidth. Bokhari et al. [18] have pointed out that by using a relaxed synchronization scheme that possibly increases the network contention, the overall performance of the complete exchange communication could be improved.

Next: 5.2 Non-Blocking Switch Up: 5. Complete Exchange on Previous: 5. Complete Exchange on Contents