next up previous contents
Next: 4. Congestive Loss on Up: 3. Performance Signatures Previous: 3.2 The Performance of   Contents

3.3 Summary

Focusing on issues related to performance understanding, we show that by benchmarking the communication system in a structural way, we can smoothly construct a logical breakdown of the communication system. As individual components correspond to certain architectural features, it is easier to explore their strength and weakness, assess design tradeoffs, and suggest remedy actions if appropriate.

The FEDP implementation shows that additional memory copy performed in a relatively high-speed host causes no performance degradation while driving a slow network. On the contrary, even with a high-end server, when applying the same mechanism on a high-speed network, the DP protocol fails to drive the Gigabit Ethernet in full speed. Therefore, we believe that when designing lightweight messaging systems, one should consider the performance gaps between processor, memory, and the network, especially should have some visions on the future development. Although the above analysis shows that part of the GEDP deficiency is coming from the PCI performance, the evaluation result also demonstrates that there still have rooms for us to further improve its performance, especially on the bi-directional bandwidth which is involved in many collective operations.

Based on our model parameter set, it shows that the data movements on the $ O_{s}$ and $ O_{r}$ parameters affect the overall performance in the Gigabit communication. Thus, further reduction of the protocol handling overheads ($ O_{s}$, $ O_{r}$ & $ U_{r}$) are needed. In particular, as data movements are inevitable, one should focus on coordinating these data movement to minimize the memory copy costs. Besides of the data movement overhead, the interrupt overheads are shown as another weak point in current situation. Although interrupt coalescing may improve the overheads for long messages, it also increases the per-packet latency; thus, is not good for small size messages as well as infrequent communications. One should adopt some heuristic method to dynamically handle the interrupt issue in a more efficient way.

The above analysis has focused on the point-to-point issue over a lightly loaded network, therefore, the network congestion issue is not addressed yet. Networks react to congestion in different ways, depending on the switch hardware as well as the adopted communication protocol. In next chapter, we are going to extend our performance studies from the point-to-point analysis to a highly congested communication pattern, the many-to-one collective operation.


next up previous contents
Next: 4. Congestive Loss on Up: 3. Performance Signatures Previous: 3.2 The Performance of   Contents