Abstract
This paper proposes an approach for the efficient automatic jointdetection and localization of single-channel acoustic events us-ing random forest regression. The audio signals are decom-posed into multiple densely overlappingsuperframesannotatedwith event class labels and their displacements to the temporalstarting and ending points of the events. Using the displacementinformation, a multivariate random forest regression model islearned for each event category to map each superframe to con-tinuous estimates of onset and offset locations of the events. Inaddition, two classifiers are trained using random forest clas-sification to classify superframes of background and differentevent categories. On testing, based on the detection of category-specific superframes using the classifiers, the learned regressorprovides the estimates of onset and offset locations in time ofthe corresponding event. While posing event detection and lo-calization as a regression problem is novel, the quantitative eval-uation on ITC-Irst database of highly variable acoustic eventsshows the efficiency and potential of the proposed approach.
Original language | English |
---|---|
Title of host publication | Proc. 15th Annual Conference of the International Speech Communication Association (INTERSPEECH 2014) |
Number of pages | 5 |
Place of Publication | Singapore |
Publisher | International Speech and Communication Association (ISCA) |
Publication date | 01.09.2014 |
Pages | 2524-2528 |
Publication status | Published - 01.09.2014 |
Event | 15th Annual Conference of the International Speech Communication Association: Celebrating the Diversity of Spoken Languages - Max Atria at Singapore Expo Singapore, Singapore, Singapore Duration: 14.09.2014 → 18.09.2014 Conference number: 108771 |