Characterization and avoidance of critical pipeline structures in aggressive superscalar processors
In recent years, with only small fractions of modern processors now accessible in a single cycle, computer architects constantly fight against propagation issues across the die. Unfortunately this trend continues to shift inward, and now the even most internal features of the pipeline are designed around communication, not computation. To address the inward creep of this constraint, this work focuses on the characterization of communication within the pipeline itself, architectural techniques to avoid it when possible, and layout co-design for early detection of problems. I present work in creating a novel detection tool for common case operand movement which can rapidly characterize an application's dataflow patterns. The results produced are suitable for exploitation as a small number of patterns can describe a significant portion of modern applications. Work on dynamic dependence collapsing takes the observations from the pattern results and shows how certain groups of operations can be dynamically grouped, avoiding unnecessary communication between individual instructions. This technique also amplifies the efficiency of pipeline data structures such as the reorder buffer, increasing both IPC and frequency. I also identify the same sets of collapsible instructions at compile time, producing the same benefits with minimal hardware complexity. This technique is also done in a backward compatible manner as the groups are exposed by simple reordering of the binary's instructions. I present aggressive pipelining approaches for these resources which avoids the critical timing often presumed necessary in aggressive superscalar processors. As these structures are designed for the worst case, pipelining them can produce greater frequency benefit than IPC loss. I also use the observation that the dynamic issue order for instructions in aggressive superscalar processors is predictable. Thus, a hardware mechanism is introduced for caching the wakeup order for groups of instructions efficiently. These wakeup vectors are then used to speculatively schedule instructions, avoiding the dynamic scheduling when it is not necessary. Finally, I present a novel approach to fast and high-quality chip layout. By allowing architects to quickly evaluate ‘what if’ scenarios during early high-level design, chip designs are less likely to encounter implementation problems later in the process.
0984: Computer science