0 0 投票数
评分

CMU 18786 (Fall 2021) Team 1 Project: Speech Audio Denoising

To enhance the speech signal, one current state-of-the-art denoising algorithm is using the Large DCU-Net 20 with Phone-Fortified Perceptual Loss (PFPL). It can enhance the speech signal well under Gaussian noise. However, it fails to denoise the speech under the realistic environmental background noise, as the PFPL minimizes the Wasserstein Distance and can damage the human speech when it denoises. To solve this problem, we introduce a new ASR Boosted Perceptual Loss, which merges the criterion of the ASR speech to text outcome to the PFPL loss function to prevent the loss of the speech signal. The evaluation metric we are using is PESQ, CSIG, CBAK, COVL, and SegSNR. Our method (merging ABPL with PFPL) can supersede the original PFPL approach under the realistic environmental noise by approximately 10% on average for all the evaluation metrics.

0 0 投票数
评分
发表留言
订阅评论
提醒
guest
您想以什么身份发表评论
邮箱将在您的评论被回复时给您通知
(可选)如果您也有个人网站,不妨分享一下
我对这篇文章的评分
这篇文章给您带来多大帮助
0 评论
内联反馈
查看所有评论