Abstract
We present in this paper two loss functions tailored for rare audio event detection in audio streams. The weighted loss is designed to tackle the common issue of imbalanced data in background/foreground classification while the multi-task loss enables the networks to simultaneously model the class distribution and the temporal structures of the target events for recognition. We study the proposed loss functions with deep neural networks (DNNs) and convolutional neural networks (CNNs) coupled with state-of-the-art phase-aware signal enhancement. Experiments on the DCASE 2017 challenge's data show that our system with the proposed losses significantly outperforms not only the DCASE 2017 baseline but also our baseline which has a similar network architecture and a standard loss function.
Original language | English |
---|---|
Title of host publication | 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) |
Number of pages | 5 |
Volume | 2018-April |
Publisher | IEEE |
Publication date | 01.04.2018 |
Pages | 336-340 |
ISBN (Print) | 978-153864658-8 |
DOIs | |
Publication status | Published - 01.04.2018 |
Event | 2018 IEEE International Conference on Acoustics, Speech, and Signal Processing - Calgary Telus Convention Center, Calgary, Canada Duration: 15.04.2018 → 20.04.2018 |