Abstract

Traditionally, High Performance Computing (HPC) software has been built and deployed as bulk-synchronous, parallel executables based on the message-passing interface (MPI) programming model. The rise of data-oriented computing paradigms and an explosion in the variety of applications that need to be supported on HPC platforms have forced a re-think of the appropriate programming and execution models to integrate this new functionality. In situ workflows demarcate a paradigm shift in HPC software development methodologies enabling a range of new applications — from user-level data services to machine learning (ML) workflows that run alongside traditional scientific simulations.

By tracing the evolution of HPC software development over the past 30 years, this dissertation identifies the key elements and trends responsible for the emergence of coupled, distributed, in situ workflows. This dissertation’s focus is on coupled in situ workflows involving composable, high-performance microservices. After outlining the motivation to enable performance observability of these services and why existing HPC performance tools and techniques can not be applied in this context, this dissertation proposes a solution wherein a set of techniques gathers, analyzes, and orients performance data from different sources to generate observability. By leveraging microservice components initially designed to build high performance data services, this dissertation demonstrates their broader applicability for building and deploying performance monitoring and visualization as services within an in situ workflow. The results from this dissertation suggest that: (1) integration of performance data from different sources is vital to understanding the performance of service components, (2) the in situ (online) analysis of this performance data is needed to enable the adaptivity of distributed components and manage monitoring data volume, (3) statistical modeling combined with performance observations can help generate better service configurations, and (4) services are a promising architecture choice for deploying in situ performance monitoring and visualization functionality. This dissertation includes previously published and co-authored material and unpublished co-authored material.

Details

Title
Performance Observability and Monitoring of High Performance Computing with Microservices
Author
Ramesh, Srinivasan  VIAFID ORCID Logo 
Publication year
2022
Publisher
ProQuest Dissertations & Theses
ISBN
9798841746478
Source type
Dissertation or Thesis
Language of publication
English
ProQuest document ID
2714619502
Copyright
Database copyright ProQuest LLC; ProQuest does not claim copyright in the individual underlying works.