Enhancing Vocal Tract Length Normalization with Elastic Registration for Automatic Speech Recognition

Florian Müller, Alfred Mertins

Abstract

Vocal tract length normalization (VTLN) is commonly applied utterance-wise with a warping function that makes the assumption of a linear dependence between the vocal tract length and the location of the formants. In this work we propose a datadriven method for enhancing the performance of systems that already use standard VTLN. The method is based on elastic registration to estimate optimal non-parametric transformations to further reduce inter-speaker variabilities. Results show that the proposed method can increase the performance of monophone systems such that it reaches that of a triphone system.
OriginalspracheEnglisch
TitelProc. Interspeech-2012
Seitenumfang4
ErscheinungsortPortland, USA
Erscheinungsdatum01.09.2012
Seiten1362-1365
ISBN (Print) 978-162276759-5
PublikationsstatusVeröffentlicht - 01.09.2012
Veranstaltung13th Annual Conference of the International Speech Communication Association 2012 - Portland, USA / Vereinigte Staaten
Dauer: 09.09.201213.09.2012
Konferenznummer: 97207

Fingerprint

Untersuchen Sie die Forschungsthemen von „Enhancing Vocal Tract Length Normalization with Elastic Registration for Automatic Speech Recognition“. Zusammen bilden sie einen einzigartigen Fingerprint.

Zitieren