Abstract
Vocal tract length normalization (VTLN) is com-monly used in state-of-the-art automatic speech recog-nition (ASR) systems to reduce the mismatch betweenspeaker-dependent formant frequency scalings. Usually,the normalization is done by a piece-wise linear scalingof the filter bank center frequencies. The linear scalingis motivated by a uniform acoustic tube model that doesnot take any loss effects into account. Furthermore, it isknown that a change in vocal tract length (VTL) yieldsdifferent spectral effects for different phonemes. How-ever, these phoneme-dependent differences are usuallynot explicitly considered in the common VTLN process-ing. In this work, we consider a vocal tract model thathas been developed within the field of articulatory speechsynthesis. The model mimics the vocal tract geometryfor different phonemes and simulates naturally occurringloss effects like yielding wall vibrations and viscous losses.An elastic registration method is used to determine therelating transforms between the spectral envelopes of vo-cal tracts with different lengths. The resulting warpingfunctions are analyzed w.r.t. their application for VTLNin ASR systems.
Original language | English |
---|---|
Number of pages | 2 |
Publication status | Published - 01.03.2012 |
Event | 38. Deutschen Jahrestagung für Akustik - Darmstadt, Germany Duration: 19.03.2012 → 22.03.2012 |
Conference
Conference | 38. Deutschen Jahrestagung für Akustik |
---|---|
Abbreviated title | DAGA 2012 |
Country/Territory | Germany |
City | Darmstadt |
Period | 19.03.12 → 22.03.12 |