Zur Hauptnavigation wechseln Zur Suche wechseln Zum Hauptinhalt wechseln

Contextual invariant-integration features for improved speaker-independent speech recognition

Florian Müller*, Alfred Mertins

*Korrespondierende/r Autor/-in für diese Arbeit

Abstract

This work presents a feature-extraction method that is based on the theory of invariant integration. The invariant-integration features are derived from an extended time period, and their computation has a very low complexity. Recognition experiments show a superior performance of the presented feature type compared to cepstral coefficients using a mel filterbank (MFCCs) or a gammatone filterbank (GTCCs) in matching as well as in mismatching training-testing conditions. Even without any speaker adaptation, the presented features yield accuracies that are larger than for MFCCs combined with vocal tract length normalization (VTLN) in matching training-test conditions. Also, it is shown that the invariant-integration features (IIFs) can be successfully combined with additional speaker-adaptation methods to further increase the accuracy. In addition to standard MFCCs also contextual MFCCs are introduced. Their performance lies between the one of MFCCs and IIFs.

OriginalspracheEnglisch
ZeitschriftSpeech Communication
Jahrgang53
Ausgabenummer6
Seiten (von - bis)830-841
Seitenumfang12
ISSN0167-6393
DOIs
PublikationsstatusVeröffentlicht - 01.07.2011

Fördermittel

This work has been supported by the German Research Foundation under Grant No. ME1170/2-1 .

UN SDGs

Dieser Output leistet einen Beitrag zu folgendem(n) Ziel(en) für nachhaltige Entwicklung

  1. SDG 9 – Industrie, Innovation und Infrastruktur
    SDG 9 – Industrie, Innovation und Infrastruktur

Fingerprint

Untersuchen Sie die Forschungsthemen von „Contextual invariant-integration features for improved speaker-independent speech recognition“. Zusammen bilden sie einen einzigartigen Fingerprint.

Zitieren