Abstract/Details

On-chip monitoring infrastructures and strategies for multi-core and many-core systems


2012 2012

Other formats: Order a copy

Abstract (summary)

On-chip sensors are widely used in processors to closely monitor system temperature, performance, and supply power fluctuation, among other environmental conditions. As the number of cores integrated in a single die increases, the total number of on-chip sensors increases correspondingly. Information from these sensors needs to be collected and processed efficiently and effectively at run-time to achieve high performance and low power consumption at the system level. In this dissertation, dedicated infrastructures for sensor data collection, sensor measurement calibration and the use of sensor information to overcome thermal emergencies, voltage droops and soft errors are examined. These problems are addressed in both multi-core and many-core environments.

This dissertation research first shows that a dedicated on-chip monitoring infrastructure (monitor network-on-chip, MNoC) can achieve better performance than a bus in interconnecting on-chip sensors in multi-core systems. Our experiment results show that this dedicated on-chip network can provide consistent low latency for sensor data packets without affecting application on-chip network traffic. The necessity of a dedicated infrastructure for monitoring is then addressed in a many-core environment. A two-level hierarchical network-on-chip (NoC), which allows for efficient sensor data collection in many-cores, is introduced. This design is evaluated using benchmark driven simulations for a three-dimensional many-core system. The use of a two-level NoC is shown to provide an average of 65% sensor data latency improvement versus a flat sensor data NoC structure for a 256 core system.

As the number of on-chip sensors increases, the accuracy of these sensors' measurement becomes more and more important since it directly affects processor performance and reliability. For example, on-chip thermal sensors are used for monitoring system temperature and their measurements may affect the system frequency and operating voltage. A new approach is introduced in this dissertation, which determines models for imprecise thermal sensor measurements using probability distributions based on device parameters. Thermal measurements which are determined to be imprecise can be excluded from thermal management strategies. The collecting of on-chip sensor measurements is facilitated by dedicated on-chip monitoring infrastructures. Experiments show that a sensor operating outside a desired precision can be identified with a detection rate of 87% and an average false alarm rate of < 6%, with a confidence level of 90%.

The introduction of dedicated infrastructures for on-chip monitoring opens the door for more advanced run-time system control strategies. In this dissertation, run-time voltage droop compensation and soft error protection in multi-cores are targeted. High voltage droops in modern processors may cause serious reliability problems. A voltage droop compensation method considering ambient temperature changes is proposed to address this issue. In the proposed method, different reduced frequencies are used at different temperatures. A voltage droop signature sharing method in multi-core systems is proposed for early detection and remediation of high voltage droops. These two methods are combined and implemented in an 8 core system and a performance benefit of 5% on average is achieved.

Soft errors are caused by alpha particles and cosmic rays, among other sources, and can be detected by processor component redundancy. An approach to selectively enable redundancy to combat soft errors is also proposed in this dissertation. Both dual modular redundancy (DMR) and chip-level redundant threading (CRT) are used for adaptive redundancy protection. Power and energy savings over 8% are achieved by both methods compared to conventional methods. A multi-core architecture vulnerability factor (AVF) is also calculated for a multi-core environment, using the MNoC infrastructure.

Indexing (details)


Subject
Electrical engineering
Classification
0544: Electrical engineering
Identifier / keyword
Applied sciences; Monitoring infrastructures; On-chip sensors
Title
On-chip monitoring infrastructures and strategies for multi-core and many-core systems
Author
Zhao, Jia
Number of pages
165
Publication year
2012
Degree date
2012
School code
0118
Source
DAI-B 73/12(E), Dissertation Abstracts International
Place of publication
Ann Arbor
Country of publication
United States
ISBN
9781267496829
Advisor
Tessier, Russell
Committee member
Burleson, Wayne; Krishna, C. Mani; Whitaker, Nathaniel
University/institution
University of Massachusetts Amherst
Department
Electrical & Computer Engineering
University location
United States -- Massachusetts
Degree
Ph.D.
Source type
Dissertations & Theses
Language
English
Document type
Dissertation/Thesis
Dissertation/thesis number
3518401
ProQuest document ID
1034280483
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.
Document URL
http://search.proquest.com/docview/1034280483
Access the complete full text

You can get the full text of this document if it is part of your institution's ProQuest subscription.

Try one of the following:

  • Connect to ProQuest through your library network and search for the document from there.
  • Request the document from your library.
  • Go to the ProQuest login page and enter a ProQuest or My Research username / password.