Next: 4.1 Reliability Up: thesis Previous: 3.3 Summary Contents

4. Congestive Loss on High-Speed Communication

Understanding the contention phenomenon is crucial to high performance computing, as contention can happen in the hosts, network links and within the routers. Furthermore, the degree of contention has a direct implication on the sustainable performance for a particular architecture-application pair. Different combination of hardware and software, together with different communication patterns and schedules may stimulate different congestion behavior. These make modeling of congestion behavior on a global communication event a challenging task. Although direct quantifying the target architecture-application pair could tell us the degree of performance loss induced by the contention, it does not provide information on the actual phenomenon that induces the loss.

In this chapter, we are focusing on the congestion behavior of those Ethernet-based lightweight communication systems under heavy congestive loss problem. In particular, we try to model the error path of a user-space Go-Back-N reliable transmission protocol, which is built on top of the Directed Point (DP) low-latency communication system. During the modeling exercises, we identify salient features that enhance our understanding on the packet loss problem, and therefore, are relevant for the design and analysis of ``contention-friendly'' communication schemes.

We start this chapter by discussing the importance of the reliability issue on lightweight communication systems. A brief survey is given in Section 4.2 on how other Ethernet-based low-latency communication schemes support the reliability issue, as well as the description of our Go-Back-N protocol. Then, in Section 4.3, we examine and model the congestion dynamic of different buffering architectures under the many-to-one congestion loss problem. These analyses are augmented by experimental evaluations on real platforms. In Section 4.3.4, we further corroborate our analysis by extending the model to cover different network configuration and communication patterns. Finally, we conclude this chapter with a discussion of related studies in Section 4.4, and provide a summary on our contributions in Section 4.5.

Subsections

Next: 4.1 Reliability Up: thesis Previous: 3.3 Summary Contents