Abstract
Generative adversarial networks (GAN) have recently been shown to be efficient for speech enhancement. However, most, if not all, existing speech enhancement GANs (SEGAN) make use of a single generator to perform one-stage enhancement mapping. In this work, we propose to use multiple generators that are chained to perform multi-stage enhancement mapping, which gradually refines the noisy input signals in a stage-wise fashion. Furthermore, we study two scenarios: (1) the generators share their parameters and (2) the generators' parameters are independent. The former constrains the generators to learn a common mapping that is iteratively applied at all enhancement stages and results in a small model footprint. On the contrary, the latter allows the generators to flexibly learn different enhancement mappings at different stages of the network at the cost of an increased model size. We demonstrate that the proposed multi-stage enhancement approach outperforms the one-stage SEGAN baseline, where the independent generators lead to more favorable results than the tied generators. The source code is available at http://github.com/pquochuy/idsegan.
| Original language | English |
|---|---|
| Article number | 9201348 |
| Journal | IEEE Signal Processing Letters |
| Volume | 27 |
| Pages (from-to) | 1700-1704 |
| Number of pages | 5 |
| ISSN | 1070-9908 |
| DOIs | |
| Publication status | Published - 2020 |
Funding
Manuscript received July 4, 2020; revised August 13, 2020; accepted September 8, 2020. Date of publication September 21, 2020; date of current version October 7, 2020. This work was supported by the Flemish Government (AI Research Program) to Maarten De Vos. The associate editor coordinating the review of this manuscript and approving it for publication was Dr. Nancy F. Chen. (Corresponding author: Huy Phan.) Huy Phan is with the Queen Mary University of London, London E1 4NS, U.K. (e-mail: [email protected]).