Abstract
In this study, a fair comparison of human and machine speech recognition is established by using the same para- digms for human speech recognition (HSR) and automatic speech recognition (ASR). In order to ensure equal condi- tions, a speech database specifically designed for this task is used. The results for HSR and ASR are broken down into sev- eral intrinsic variabilities like speaking rate, speaking effort and dialect. Across all conditions, ASR error rates are at least 300 % higher than those of humans, even though no contex- tual knowledge can be exploited. A more detailed analysis of errors in HSR and ASR is carried out by decomposing speech into its phonetic features like voicing or manner and place of articulation. Confusion matrices for these features show that voicing information is crucial to distinguish between certain consonants. The most prominent features for ASR often ne- glect voicing information, which might contribute to the large gap in performance between HSR and ASR.
Originalsprache | Englisch |
---|---|
Seiten | 1-6 |
Seitenumfang | 6 |
Publikationsstatus | Veröffentlicht - 01.05.2006 |
Veranstaltung | Speech Recognition and Intrinsic Variation (SRIV2006) - Toulouse, Frankreich Dauer: 20.05.2006 → 20.05.2006 |
Tagung, Konferenz, Kongress
Tagung, Konferenz, Kongress | Speech Recognition and Intrinsic Variation (SRIV2006) |
---|---|
Land/Gebiet | Frankreich |
Ort | Toulouse |
Zeitraum | 20.05.06 → 20.05.06 |