Networking issues in distributed real -time systems

2002 2002

Other formats: Order a copy

Abstract (summary)

Networking involves every aspect in the design of the network infrastructure from the selection/synthesis of the interconnection topology to what communication protocols it should use and how it should be deployed and maintained. A large body of literature is available on these issues. We attempt to further increase this body of literature by looking at two specific issues: the synthesis of networks that satisfy multiple properties and the design of fault tolerant communication services for high-speed networks.

Synthesizing networks that satisfy multiple requirements, such as high reliability, low diameter, good embeddability etc., is a difficult problem to which there has been no completely satisfactory solution. Our approach to the problem involves a simple filtration process that takes as input a large number of randomly generated graphs. By using multiple filters, one for each requirement and arranging them such that one feeds the other, the final output consists of a short-list of networks that the designer can choose from. Our experimental results show that this approach is both practical and powerful. Perhaps our biggest achievement here is that we show how this seemingly simple approach can generate networks that are serious competitors to several traditional well-known networks. We further highlight the practical applicability of these networks by considering how they can be effectively used in a packaging environment.

The interconnection network can have a dominant effect on the reliability of a distributed system. While existing network softwares have been optimized for performance, they have not been able to deal with network failures effectively. We have developed a light-weight fault detection and recovery technique that provides coverage for almost all network interface failures. The detection is based on software watchdog timers and the recovery is based on delta-logging. We have implemented the schemes as a fault tolerance layer over Myrinet, a commercially available networking technology. The implementation showed that a fault detection time of 1 ms and a complete recovery time of around 0.5 second can be achieved with a performance impact of less than 10%. The effectiveness of our fault tolerance schemes was evaluated using a versatile performance and recovery analysis tool called RAPIDS.

Indexing (details)

Electrical engineering;
Computer science
0544: Electrical engineering
0984: Computer science
Identifier / keyword
Applied sciences; Distributed real-time; Fault tolerance; Interconnection networks; Networking
Networking issues in distributed real -time systems
Lakamraju, Vijaya Ramaraju
Number of pages
Publication year
Degree date
School code
DAI-B 63/06, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
9780493716572, 0493716572
Koren, Israel
University of Massachusetts Amherst
University location
United States -- Massachusetts
Source type
Dissertations & Theses
Document type
Dissertation/thesis number
ProQuest document ID
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
Access the complete full text

You can get the full text of this document if it is part of your institution's ProQuest subscription.

Try one of the following:

  • Connect to ProQuest through your library network and search for the document from there.
  • Request the document from your library.
  • Go to the ProQuest login page and enter a ProQuest or My Research username / password.