TY - JOUR
T1 - Improving GANs for Speech Enhancement
AU - Phan, Huy
AU - McLoughlin, Ian V.
AU - Pham, Lam
AU - Chen, Oliver Y.
AU - Koch, Philipp
AU - De Vos, Maarten
AU - Mertins, Alfred
N1 - Funding Information:
Manuscript received July 4, 2020; revised August 13, 2020; accepted September 8, 2020. Date of publication September 21, 2020; date of current version October 7, 2020. This work was supported by the Flemish Government (AI Research Program) to Maarten De Vos. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Nancy F. Chen. (Corresponding author: Huy Phan.) Huy Phan is with the Queen Mary University of London, London E1 4NS, U.K. (e-mail: [email protected]).
Publisher Copyright:
© 1994-2012 IEEE.
Copyright:
Copyright 2020 Elsevier B.V., All rights reserved.
PY - 2020
Y1 - 2020
N2 - Generative adversarial networks (GAN) have recently been shown to be efficient for speech enhancement. However, most, if not all, existing speech enhancement GANs (SEGAN) make use of a single generator to perform one-stage enhancement mapping. In this work, we propose to use multiple generators that are chained to perform multi-stage enhancement mapping, which gradually refines the noisy input signals in a stage-wise fashion. Furthermore, we study two scenarios: (1) the generators share their parameters and (2) the generators' parameters are independent. The former constrains the generators to learn a common mapping that is iteratively applied at all enhancement stages and results in a small model footprint. On the contrary, the latter allows the generators to flexibly learn different enhancement mappings at different stages of the network at the cost of an increased model size. We demonstrate that the proposed multi-stage enhancement approach outperforms the one-stage SEGAN baseline, where the independent generators lead to more favorable results than the tied generators. The source code is available at http://github.com/pquochuy/idsegan.
AB - Generative adversarial networks (GAN) have recently been shown to be efficient for speech enhancement. However, most, if not all, existing speech enhancement GANs (SEGAN) make use of a single generator to perform one-stage enhancement mapping. In this work, we propose to use multiple generators that are chained to perform multi-stage enhancement mapping, which gradually refines the noisy input signals in a stage-wise fashion. Furthermore, we study two scenarios: (1) the generators share their parameters and (2) the generators' parameters are independent. The former constrains the generators to learn a common mapping that is iteratively applied at all enhancement stages and results in a small model footprint. On the contrary, the latter allows the generators to flexibly learn different enhancement mappings at different stages of the network at the cost of an increased model size. We demonstrate that the proposed multi-stage enhancement approach outperforms the one-stage SEGAN baseline, where the independent generators lead to more favorable results than the tied generators. The source code is available at http://github.com/pquochuy/idsegan.
UR - http://www.scopus.com/inward/record.url?scp=85092708306&partnerID=8YFLogxK
U2 - 10.1109/LSP.2020.3025020
DO - 10.1109/LSP.2020.3025020
M3 - Journal articles
AN - SCOPUS:85092708306
SN - 1070-9908
VL - 27
SP - 1700
EP - 1704
JO - IEEE Signal Processing Letters
JF - IEEE Signal Processing Letters
M1 - 9201348
ER -