PhD Candidate: Yumna Anwar
Abstract:
The rapid advancement of deep neural networks (DNNs) has transformed audio processing, unlocking new possibilities in healthcare, particularly in fields such as epidemiology and audiology. Despite their substantial advantages over traditional signal processing methods, deploying DNNs in real-world healthcare applications remains challenging due to practical barriers such as limited data, environmental variability, or constrained hardware. This thesis explores these challenges through two healthcare applications that leverage DNNs to analyze audio data.
To ground these challenges in real-world settings, one part of this thesis investigates automated cough detection for respiratory disease surveillance, a task with significant potential in disease monitoring and clinical decision support, as cough sounds can serve as biomarkers of respiratory conditions. This task presents challenges such as the scarcity of real-world, naturally occurring cough data as most available datasets rely on voluntary coughs, as well as data imbalance and the acoustic similarity of coughs to background sounds, all of which complicate reliable detection. To overcome these challenges, we collected a dataset of cough and non-cough sounds from clinic waiting rooms where respiratory symptoms are prevalent. A contribution of this thesis is that we have made this dataset available to other researchers. Using this dataset, we trained an ensemble of convolutional neural networks that achieved an ROC AUC of 98.1%, demonstrating the feasibility of automated cough detection as a practical biomarker for tracking respiratory illness trends in the community.
Building on this focus on real-world applicability, the second study investigates real-time speech enhancement for hearing aids, which must operate under strict computational and latency constraints of approximately 15 ms. We propose an edge-assisted framework that employs compact, environment-specific denoising models based on a time-domain U-Net/DEMUCS architecture with an LSTM bottleneck. On a CPU-only Raspberry Pi 4, these small models meet real-time processing requirements while maintaining high audio quality. Across 13 noise environments, noise-specific models outperform universal ones in PESQ, with user studies confirming higher listener preference.
Together, these studies bridge the gap between theoretical model development and practical healthcare deployment, advancing robust, context-aware, and resource efficient DNNs that adapt effectively to the diverse conditions of real-world environments.
Advisor: Octav Chipara
Location: Zoom. (Please contact Yumna, yumna-anwar@uiowa.edu, if you plan to attend.)