Random regression forests for acoustic event detection and classification

Huy Phan, Marco Maaß, Radoslaw Mazur, Alfred Mertins

45 Zitate (Scopus)

Abstract

Despite the success of the automatic speech recognition framework in its own application field, its adaptation to the problem of acoustic event detection has resulted in limited success. In this paper, instead of treating the problem similar to the segmentation and classification tasks in speech recognition, we pose it as a regression task and propose an approach based on random forest regression. Furthermore, event localization in time can be efficiently handled as a joint problem. We first decompose the training audio signals into multiple interleaved superframes which are annotated with the corresponding event class labels and their displacements to the temporal onsets and offsets of the events. For a specific event category, a random-forest regression model is learned using the displacement information. Given an unseen superframe, the learned regressor will output the continuous estimates of the onset and offset locations of the events. To deal with multiple event categories, prior to the category-specific regression phase, a superframe-wise recognition phase is performed to reject the background superframes and to classify the event superframes into different event categories. While jointly posing event detection and localization as a regression problem is novel, the superior performance on two databases ITC-Irst and UPC-TALP demonstrates the efficiency and potential of the proposed approach.

OriginalspracheEnglisch
ZeitschriftIEEE/ACM Transactions on Audio Speech and Language Processing
Jahrgang23
Ausgabenummer1
Seiten (von - bis)20-31
Seitenumfang12
ISSN2329-9290
DOIs
PublikationsstatusVeröffentlicht - 01.01.2015

Fingerprint

Untersuchen Sie die Forschungsthemen von „Random regression forests for acoustic event detection and classification“. Zusammen bilden sie einen einzigartigen Fingerprint.

Zitieren