Robust Features for Speaker-Independent Speech Recognition Based on a Certain Class of Translation-Invariant Transformations

Florian Müller, Alfred Mertins

Abstract

The spectral effects of vocal tract length (VTL) differences are one reason for the lower recognition rate of today's speaker-independent automatic speech recognition (ASR) systems compared to speaker-dependent ones. By using certain types of filter banks the VTL-related effects can be described by a translation in subband-index space. In this paper, nonlinear translation-invariant transformations that originally have been proposed in the field of pattern recognition are investigated for their applicability in speaker-independent ASR tasks. It is shown that the combination of different types of such transformations leads to features that are more robust against VTL changes than the standard mel-frequency cepstral coefficients and that they almost yield the performance of vocal tract length normalization without any adaption to individual speakers.
OriginalspracheEnglisch
TitelAdvances in Nonlinear Speech Processing
Redakteure/-innenJordi Solé-Casals, Vladimir Zaiats
Seitenumfang9
Band5933
ErscheinungsortBerlin, Heidelberg
Herausgeber (Verlag)Springer Berlin Heidelberg
Erscheinungsdatum2010
Seiten111-119
ISBN (Print)978-3-642-11508-0
ISBN (elektronisch)978-3-642-11509-7
DOIs
PublikationsstatusVeröffentlicht - 2010
VeranstaltungInternational Conference on Nonlinear Speech Processing 2009 - Vic, Spanien
Dauer: 25.06.200927.06.2009
Konferenznummer: 79990

Fingerprint

Untersuchen Sie die Forschungsthemen von „Robust Features for Speaker-Independent Speech Recognition Based on a Certain Class of Translation-Invariant Transformations“. Zusammen bilden sie einen einzigartigen Fingerprint.

Zitieren