Abstract/Details

Using statistical monitoring to detect failures in Internet services


2005 2005

Other formats: Order a copy

Abstract (summary)

Since the Internet's popular emergence in the mid-1990's, Internet services such as e-mail and messaging systems, search engines, e-commerce, news and financial sites, have become an important and often mission-critical part of our society. Unfortunately, managing these systems and keeping them running is a significant challenge. Their rapid rate of change as well as their size and complexity mean that the developers and operators of these services usually have only an incomplete idea of how the system works and even what it is supposed to do. This results in poor fault management, as operators have a hard time diagnosing faults and an even harder time detecting them.

This dissertation argues that statistical monitoring—the use of statistical analysis and machine learning techniques to analyze live observations of a system's behavior—can be an important tool in improving the manageability of Internet services. Statistical monitoring has several important features that are well suited to managing Internet services. First, the dynamic analysis of a system's behavior in statistical monitoring means that there is no dependency on specifications or descriptions that might be stale or incorrect. Second, monitoring a live, deployed system gives insight into system behavior that cannot be achieved in QA or testing environments. Third, automatic analysis through statistical monitoring can better cope with larger and more complex systems, aiding human operators as well as automating parts of the system management process.

This dissertation presents a statistical monitoring approach to three fault management problems: detecting failures in Internet services without requiring a priori knowledge of correct application behavior; automatically inferring undocumented system structure and invariants; and localizing the potential cause of a failure given its symptoms. We describe our methodology as well as our experiments with prototype implementations. Our experience provides strong support for statistical monitoring, and suggest that it may prove to be an important tool in improving the manageability and reliability of Internet services.

Indexing (details)


Subject
Computer science
Classification
0984: Computer science
Identifier / keyword
Applied sciences; Failures; Fault detection; Internet services; Statistical monitoring
Title
Using statistical monitoring to detect failures in Internet services
Author
Kiciman, Emre
Number of pages
168
Publication year
2005
Degree date
2005
School code
0212
Source
DAI-B 66/08, Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
ISBN
9780542295218, 0542295210
Advisor
Fox, Armando
University/institution
Stanford University
University location
United States -- California
Degree
Ph.D.
Source type
Dissertations & Theses
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
3187303
ProQuest document ID
305393053
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
http://search.proquest.com/docview/305393053
Access the complete full text

You can get the full text of this document if it is part of your institution's ProQuest subscription.

Try one of the following:

  • Connect to ProQuest through your library network and search for the document from there.
  • Request the document from your library.
  • Go to the ProQuest login page and enter a ProQuest or My Research username / password.