Nonlinear translation-invariant transformations for speaker-independent speech recognition

Florian Müller, Alfred Mertins

Abstract

The spectral effects of vocal tract length (VTL) changes are one reason of why the recognition rate of today's speaker-independent automatic speech recognition (ASR) systems is considerably lower than the one of speaker-dependent systems. By using certain types of filter-banks these effects can be described by a translation in subband-index space. In this paper, nonlinear translation-invariant transforms that orig-inally have been proposed in the field of pattern recognition are investi-gated for their applicability in speaker-independent ASR tasks. It will be shown that the combination of different types of such transforms leads to features that are more robust against VTL changes than the standard Mel-frequency cepstral coefficients and that almost yield the performance of vocal tract length normalization without any adaption to individual speakers.
Original languageEnglish
Number of pages9
Publication statusPublished - 01.06.2009
Event NOLISP 2009 : Workshop on Non-Linear Speech Processing - Vic (Barcelona), Spain
Duration: 25.06.200927.06.2009
http://www.wikicfp.com/cfp/servlet/event.showcfp?eventid=3313&copyownerid=2

Conference

Conference NOLISP 2009 : Workshop on Non-Linear Speech Processing
Country/TerritorySpain
CityVic (Barcelona)
Period25.06.0927.06.09
Internet address

Fingerprint

Dive into the research topics of 'Nonlinear translation-invariant transformations for speaker-independent speech recognition'. Together they form a unique fingerprint.

Cite this