Managing resources for high performance and low energy in general-purpose processors
Microarchitectural techniques, such as superscalar instruction issue, Out-Of-Order instruction execution (OOO), Simultaneous Multi-Threading (SMT) and Chip Multi-Processing (CMP), improve processor performance dramatically. However, as processor design becomes more and more complicated, how to manage the abundant processor resources to achieve optimal performance and power consumption of processors becomes increasingly more sophisticated. This dissertation investigates resource usage controlling techniques for general-purpose microprocessors (supporting both single hardware context and multiple hardware contexts) targeting both energy and performance.
We address the power-inefficient resource usage issue in single-context processors and propose a Compiler-based Adaptive Fetch Throttling (CAFT) technique which combines the benefits of a hardware-based runtime throttling technique and a software-based static throttling technique providing good energy savings with a low performance loss. Our simulation results show that the proposed technique doubles the energy-delay product (EDP) savings compared to the fixed threshold throttling.
We introduce the resource competing problem for SMT processors, which allow multiple threads to simultaneously share processor resources and improve the energy-efficiency indirectly by resource sharing. We present a novel Adaptive Resource Partitioning Algorithm (ARPA) to control the usage and sharing of processor resources in SMT processors. ARPA analyzes the resource usage efficiency of each thread in a time period and assigns more resources to threads which can use them in a more efficient way. Simulation results on a large set of 42 multiprogrammed workloads show that ARPA outperforms the currently best dynamic resource allocation technique, Hill-climbing, by 5.7% with regard to the overall instruction throughput. Considering fairness accorded to each thread, ARPA attains 9.2% improvements over Hill-climbing, using a commonly used fairness metric.
We also propose resource adaptation approaches to adaptively control the number of powered-on ROB entries and partition shared resources among threads for both shared-ROB and divided-ROB structures, targeting both high performance and low energy. Our resource adaptation algorithms approaches consider not only the relative resource usage efficiency of each thread like ARPA, but also take into account the real resource usage of threads to identify cases of inefficient resource usage behavior and save energy. Our experimental results show that for an SMT processor with a shared-ROB structure, our resource adaptation approach achieves 16.7% energy savings over ARPA, while the performance loss is negligible across 42 sample workloads. For an SMT processor with a divided-ROB structure, our resource adaptation approach outperforms ARPA by 4.2% in addition to achieving 12.4% energy savings.