Enhancing Vocal Tract Length Normalization with Elastic Registration for Automatic Speech Recognition

Florian Müller, Alfred Mertins

Abstract

Vocal tract length normalization (VTLN) is commonly applied utterance-wise with a warping function that makes the assumption of a linear dependence between the vocal tract length and the location of the formants. In this work we propose a datadriven method for enhancing the performance of systems that already use standard VTLN. The method is based on elastic registration to estimate optimal non-parametric transformations to further reduce inter-speaker variabilities. Results show that the proposed method can increase the performance of monophone systems such that it reaches that of a triphone system.
Original languageEnglish
Title of host publicationProc. Interspeech-2012
Number of pages4
Place of PublicationPortland, USA
Publication date01.09.2012
Pages1362-1365
ISBN (Print) 978-162276759-5
Publication statusPublished - 01.09.2012
Event13th Annual Conference of the International Speech Communication Association 2012 - Portland, United States
Duration: 09.09.201213.09.2012
Conference number: 97207

Fingerprint

Dive into the research topics of 'Enhancing Vocal Tract Length Normalization with Elastic Registration for Automatic Speech Recognition'. Together they form a unique fingerprint.

Cite this