PhD Candidate: Shihui Song
Abstract: Scientific simulations and modern AI workloads generate data at a massive scale. As these workloads run on modern accelerator systems, three major bottlenecks often arise: communication overheads, memory consumption, and I/O burden. My research focuses on addressing these challenges through system and compiler approaches that improve the efficiency of data movement and storage.
In this talk, I will present my work that targets these bottlenecks from different perspectives. I will begin with a data placement method for graph neural network (GNN) training, which reduces costly CPU–GPU and GPU–GPU data transfers. I will then present three works of lossy compression on Cerebras that reduce memory usage and I/O costs. The first is CereSZ, the first error-bounded lossy compressor designed for the Cerebras Wafer-Scale Engine (WSE). Next, I will introduce WaferSZ, which improves compression efficiency on wafer-scale architectures using a fixed-size Huffman encoding scheme. Finally, I will present P3Z, a domain-specific compiler that allows users to describe compression algorithms in high-level Python definitions and automatically generates optimized implementations for both CPU and WSE backends. Together, these efforts demonstrate how system–algorithm co-design can improve the scalability and efficiency of large-scale scientific and AI workloads on emerging accelerator platforms.
Advisor: Peng Jiang
Location: MLH B-13 (Please contact Shihui Song if you plan to attend: shihui-song@uiowa.edu)