CMU 18786 (Fall 2021) Team 1 Project: Speech Audio Denoising
To enhance the speech signal, one current state-of-the-art denoising algorithm is using the Large DCU-Net 20 with Phone-Fortified Perceptual Loss (PFPL). It can enhance the speech signal well under Gaussian noise. However, it fails to denoise the speech under the realistic environmental background noise, as the PFPL minimizes the Wasserstein Distance and can damage the human speech when it denoises. To solve this problem, we introduce a new ASR Boosted Perceptual Loss, which merges the criterion of the ASR speech to text outcome to the PFPL loss function to prevent the loss of the speech signal. The evaluation metric we are using is PESQ, CSIG, CBAK, COVL, and SegSNR. Our method (merging ABPL with PFPL) can supersede the original PFPL approach under the realistic environmental noise by approximately 10% on average for all the evaluation metrics.