Wednesday, August 10, 2022
The referenced media source is missing and needs to be re-embedded.

... for a project entitled A Systematic Approach to Minimize Compression Error Propagation in HPC Applications.

Guanpeng Li is an Assistant Professor in the Department of Computer Science at the University of Iowa. His research interests are in the areas of HPC fault-tolerance and data reduction, safety of autonomous driving, and machine learning dependability.

Overview

Today’s high-performance computing (HPC) applications produce vast volumes of data for post-analysis, presenting a major storage and I/O burden for scientists. To significantly reduce this burden, researchers have explored to use lossy compression techniques in their scientific re-search in order to reduce the size of data produced by the applications. While lossy compression can effectively reduce the size of data, it also introduces errors to the compressed data that often lead to incorrect computation results. In the past, researchers have extended considerable effort to improve the compression efficiency (e.g., compression ratios). However, none of existing studies has investigated how lossy compression errors propagate in HPC applications, and how the error im-pact, as a result of error propagation, can be minimized in affecting the validity of computations. Consequently, scientists hesitate to use lossy compression in their scientific research. Therefore, there is a critical need to develop an effective method to identify a compression strategy that pro-vides not only a high compression ratio but also a low error impact for HPC applications and hence close this gap. In this project, we aim to develop a Compression-Error-Aware-Program-Analysis (CEAPA) framework for data-intensive domains. The primary goal is to automatically select a best-fit lossy compression strategy that minimizes the error impact based on target compression ratio. The proposed framework models compression error propagation in HPC applications via integrating program analysis and machine learning (ML) techniques. Our approach is data-driven and focuses on cosmology and climate applications but can be extended to various other domains.

Intellectual Merit

Modeling compression error propagation in HPC programs is challenging because existing lossy compressors are developed with distinct principles that generate largely different compression errors on diverse HPC data. Abstracting compression errors and reasoning about their propagation in different program structures are not straightforward as different kinds of compression errors reveal diverse propagation characteristics. This project includes four critical thrusts:

  1. Developing an accurate and efficient fault injection infrastructure that integrates with the fault models of commonly used lossy compression algorithms;
  2. Designing a fine-grained approach to characterize error propagation in HPC programs through program analysis and deposition based on the data dependencies and life cycle of compressed data;
  3. Developing a predictive model using ML techniques to select a compression strategy that minimizes the error impact on a given program and compression ratio;
  4. Integrating our technique with domain-specific error impact metrics in real-world cosmological and climate applications, and demonstrating the effectiveness of the technique by selecting compression strategies that give low error impact for the same ratios.

Broader Impacts

The success of this research is essential to making progress in many HPC-related disciplines, where applications generate large amounts of data for post analysis and visualization. This is not only because it can help scientists determine low-error-impact lossy compression strategy, but also help developers improve the compression algorithms, fully enabling the in-situ data reduction in exascale applications and systems. It will also systematically enhance the researchers’ understanding of the trade-offs between the benefits of lossy compression and the negative impacts to the computation validity. Moreover, this project is substantially helpful to the education and training of undergraduates and graduates by enhancing the quality of computer science curricula related to HPC, and outreach activities at the University of Iowa (UI), Washington State University (WSU), and other partners. Specifically, PIs will integrate research findings related to resilience of lossy compression error into their courses at UI and WSU. They will recruit students with diverse backgrounds particularly underrepresented minority and female students. They will establish a training program to help scientists from universities and national labs learn how to leverage compression to advance their HPC data management. This project will also strengthen the partnerships with national labs by supporting their critical exascale systems and applications.


This three-year project, a collaboration with Dingwen Tao at UI-Bloomington, has been awarded $600,000 for FY 2022.

NSF OAC CORE offers competitive funding that "supports translational research and education in all aspects of advanced cyberinfrastructure that lead to deployable, scalable and sustainable systems capable of transforming science and engineering."
NSF logo

More on Li - and Peng Jiang - HPC research may be found on the IOWA-HPC Lab website.