On using the auditory image model and invariant-integration for noise robust automatic speech recognition

F. Müller, A. Mertins

Abstract

Commonly used feature extraction methods for automatic speech recognition (ASR) incorporate only rudimentary psychoacoustic findings. Several works showed that a physiologically closer auditory processing during the feature extraction stage can enhance the robustness of an ASR system in noisy environments. The “auditory image model” (AIM) is such a more sophisticated computational model. In this work we show how invariant integration can be applied in the feature space given by the AIM, and we analyze the performance of the resulting features under noisy conditions on the Aurora-2 task. Furthermore, we show that previously presented features based on power-normalization and invariant integration benefit from the AIM-based integration features when the feature vectors are combined with each other.
Original languageEnglish
Title of host publication2012 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Number of pages4
PublisherIEEE
Publication date01.03.2012
Pages4905-4908
ISBN (Print)978-1-4673-0045-2
ISBN (Electronic)978-1-4673-0046-9
DOIs
Publication statusPublished - 01.03.2012
Event2012 IEEE International Conference on Acoustics, Speech, and Signal Processing - Kyoto, Japan
Duration: 25.03.201230.03.2012
Conference number: 93091

Fingerprint

Dive into the research topics of 'On using the auditory image model and invariant-integration for noise robust automatic speech recognition'. Together they form a unique fingerprint.

Cite this