Random regression forests for acoustic event detection and classification

Huy Phan, Marco Maaß, Radoslaw Mazur, Alfred Mertins

45 Citations (Scopus)

Abstract

Despite the success of the automatic speech recognition framework in its own application field, its adaptation to the problem of acoustic event detection has resulted in limited success. In this paper, instead of treating the problem similar to the segmentation and classification tasks in speech recognition, we pose it as a regression task and propose an approach based on random forest regression. Furthermore, event localization in time can be efficiently handled as a joint problem. We first decompose the training audio signals into multiple interleaved superframes which are annotated with the corresponding event class labels and their displacements to the temporal onsets and offsets of the events. For a specific event category, a random-forest regression model is learned using the displacement information. Given an unseen superframe, the learned regressor will output the continuous estimates of the onset and offset locations of the events. To deal with multiple event categories, prior to the category-specific regression phase, a superframe-wise recognition phase is performed to reject the background superframes and to classify the event superframes into different event categories. While jointly posing event detection and localization as a regression problem is novel, the superior performance on two databases ITC-Irst and UPC-TALP demonstrates the efficiency and potential of the proposed approach.

Original languageEnglish
JournalIEEE/ACM Transactions on Audio Speech and Language Processing
Volume23
Issue number1
Pages (from-to)20-31
Number of pages12
ISSN2329-9290
DOIs
Publication statusPublished - 01.01.2015

Fingerprint

Dive into the research topics of 'Random regression forests for acoustic event detection and classification'. Together they form a unique fingerprint.

Cite this