Abstract/Details

Networking issues in distributed real -time systems


2002 2002

Other formats: Order a copy

Abstract (summary)

Networking involves every aspect in the design of the network infrastructure from the selection/synthesis of the interconnection topology to what communication protocols it should use and how it should be deployed and maintained. A large body of literature is available on these issues. We attempt to further increase this body of literature by looking at two specific issues: the synthesis of networks that satisfy multiple properties and the design of fault tolerant communication services for high-speed networks.

Synthesizing networks that satisfy multiple requirements, such as high reliability, low diameter, good embeddability etc., is a difficult problem to which there has been no completely satisfactory solution. Our approach to the problem involves a simple filtration process that takes as input a large number of randomly generated graphs. By using multiple filters, one for each requirement and arranging them such that one feeds the other, the final output consists of a short-list of networks that the designer can choose from. Our experimental results show that this approach is both practical and powerful. Perhaps our biggest achievement here is that we show how this seemingly simple approach can generate networks that are serious competitors to several traditional well-known networks. We further highlight the practical applicability of these networks by considering how they can be effectively used in a packaging environment.

The interconnection network can have a dominant effect on the reliability of a distributed system. While existing network softwares have been optimized for performance, they have not been able to deal with network failures effectively. We have developed a light-weight fault detection and recovery technique that provides coverage for almost all network interface failures. The detection is based on software watchdog timers and the recovery is based on delta-logging. We have implemented the schemes as a fault tolerance layer over Myrinet, a commercially available networking technology. The implementation showed that a fault detection time of 1 ms and a complete recovery time of around 0.5 second can be achieved with a performance impact of less than 10%. The effectiveness of our fault tolerance schemes was evaluated using a versatile performance and recovery analysis tool called RAPIDS.

Indexing (details)


Subject
Electrical engineering;
Computer science
Classification
0544: Electrical engineering
0984: Computer science
Identifier / keyword
Applied sciences, Distributed real-time, Fault tolerance, Interconnection networks, Networking
Title
Networking issues in distributed real -time systems
Author
Lakamraju, Vijaya Ramaraju
Number of pages
128
Publication year
2002
Degree date
2002
School code
0118
Source
DAI-B 63/06, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
ISBN
9780493716572, 0493716572
Advisor
Koren, Israel
University/institution
University of Massachusetts Amherst
University location
United States -- Massachusetts
Degree
Ph.D.
Source type
Dissertations & Theses
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
3056251
ProQuest document ID
275656222
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
http://search.proquest.com/docview/275656222
Access the complete full text

You can get the full text of this document if it is part of your institution's ProQuest subscription.

Try one of the following:

  • Connect to ProQuest through your library network and search for the document from there.
  • Request the document from your library.
  • Go to the ProQuest login page and enter a ProQuest or My Research username / password.