Robust Features for Speaker-Independent Speech Recognition Based on a Certain Class of Translation-Invariant Transformations

Florian Müller, Alfred Mertins

Abstract

The spectral effects of vocal tract length (VTL) differences are one reason for the lower recognition rate of today's speaker-independent automatic speech recognition (ASR) systems compared to speaker-dependent ones. By using certain types of filter banks the VTL-related effects can be described by a translation in subband-index space. In this paper, nonlinear translation-invariant transformations that originally have been proposed in the field of pattern recognition are investigated for their applicability in speaker-independent ASR tasks. It is shown that the combination of different types of such transformations leads to features that are more robust against VTL changes than the standard mel-frequency cepstral coefficients and that they almost yield the performance of vocal tract length normalization without any adaption to individual speakers.
Original languageEnglish
Title of host publicationAdvances in Nonlinear Speech Processing
EditorsJordi Solé-Casals, Vladimir Zaiats
Number of pages9
Volume5933
Place of PublicationBerlin, Heidelberg
PublisherSpringer Berlin Heidelberg
Publication date2010
Pages111-119
ISBN (Print)978-3-642-11508-0
ISBN (Electronic)978-3-642-11509-7
DOIs
Publication statusPublished - 2010
EventInternational Conference on Nonlinear Speech Processing 2009 - Vic, Spain
Duration: 25.06.200927.06.2009
Conference number: 79990

Fingerprint

Dive into the research topics of 'Robust Features for Speaker-Independent Speech Recognition Based on a Certain Class of Translation-Invariant Transformations'. Together they form a unique fingerprint.

Cite this