A Human-Machine Comparison in Speech Recognition Based on a Logatome Corpus

Bernd Meyer, Thorsten Wesker, Thomas Brand, Alfred Mertins, Birger Kollmeier

Abstract

In this study, a fair comparison of human and machine speech recognition is established by using the same para- digms for human speech recognition (HSR) and automatic speech recognition (ASR). In order to ensure equal condi- tions, a speech database specifically designed for this task is used. The results for HSR and ASR are broken down into sev- eral intrinsic variabilities like speaking rate, speaking effort and dialect. Across all conditions, ASR error rates are at least 300 % higher than those of humans, even though no contex- tual knowledge can be exploited. A more detailed analysis of errors in HSR and ASR is carried out by decomposing speech into its phonetic features like voicing or manner and place of articulation. Confusion matrices for these features show that voicing information is crucial to distinguish between certain consonants. The most prominent features for ASR often ne- glect voicing information, which might contribute to the large gap in performance between HSR and ASR.
Original languageEnglish
Pages1-6
Number of pages6
Publication statusPublished - 01.05.2006
EventSpeech Recognition and Intrinsic Variation (SRIV2006)
- Toulouse, France
Duration: 20.05.200620.05.2006

Conference

ConferenceSpeech Recognition and Intrinsic Variation (SRIV2006)
Country/TerritoryFrance
CityToulouse
Period20.05.0620.05.06

Cite this