On the Use of a Wave-Reflection Model for the Estimation of Spectral Effects due to Vocal Tract Length Changes with Application to Automatic Speech Recognition

Florian Müller, Alfred Mertins

Abstract

Vocal tract length normalization (VTLN) is com-monly used in state-of-the-art automatic speech recog-nition (ASR) systems to reduce the mismatch betweenspeaker-dependent formant frequency scalings. Usually,the normalization is done by a piece-wise linear scalingof the filter bank center frequencies. The linear scalingis motivated by a uniform acoustic tube model that doesnot take any loss effects into account. Furthermore, it isknown that a change in vocal tract length (VTL) yieldsdifferent spectral effects for different phonemes. How-ever, these phoneme-dependent differences are usuallynot explicitly considered in the common VTLN process-ing. In this work, we consider a vocal tract model thathas been developed within the field of articulatory speechsynthesis. The model mimics the vocal tract geometryfor different phonemes and simulates naturally occurringloss effects like yielding wall vibrations and viscous losses.An elastic registration method is used to determine therelating transforms between the spectral envelopes of vo-cal tracts with different lengths. The resulting warpingfunctions are analyzed w.r.t. their application for VTLNin ASR systems.
OriginalspracheEnglisch
Seitenumfang2
PublikationsstatusVeröffentlicht - 01.03.2012
Veranstaltung 38. Deutschen Jahrestagung für Akustik
- Darmstadt, Deutschland
Dauer: 19.03.201222.03.2012

Tagung, Konferenz, Kongress

Tagung, Konferenz, Kongress 38. Deutschen Jahrestagung für Akustik
KurztitelDAGA 2012
Land/GebietDeutschland
OrtDarmstadt
Zeitraum19.03.1222.03.12

Fingerprint

Untersuchen Sie die Forschungsthemen von „On the Use of a Wave-Reflection Model for the Estimation of Spectral Effects due to Vocal Tract Length Changes with Application to Automatic Speech Recognition“. Zusammen bilden sie einen einzigartigen Fingerprint.

Zitieren