Abstract
The spectral effects of vocal tract length (VTL) differences are one reason for the lower recognition rate of today's speaker-independent automatic speech recognition (ASR) systems compared to speaker-dependent ones. By using certain types of filter banks the VTL-related effects can be described by a translation in subband-index space. In this paper, nonlinear translation-invariant transformations that originally have been proposed in the field of pattern recognition are investigated for their applicability in speaker-independent ASR tasks. It is shown that the combination of different types of such transformations leads to features that are more robust against VTL changes than the standard mel-frequency cepstral coefficients and that they almost yield the performance of vocal tract length normalization without any adaption to individual speakers.
Originalsprache | Englisch |
---|---|
Titel | Advances in Nonlinear Speech Processing |
Redakteure/-innen | Jordi Solé-Casals, Vladimir Zaiats |
Seitenumfang | 9 |
Band | 5933 |
Erscheinungsort | Berlin, Heidelberg |
Herausgeber (Verlag) | Springer Berlin Heidelberg |
Erscheinungsdatum | 2010 |
Seiten | 111-119 |
ISBN (Print) | 978-3-642-11508-0 |
ISBN (elektronisch) | 978-3-642-11509-7 |
DOIs | |
Publikationsstatus | Veröffentlicht - 2010 |
Veranstaltung | International Conference on Nonlinear Speech Processing 2009 - Vic, Spanien Dauer: 25.06.2009 → 27.06.2009 Konferenznummer: 79990 |