CNN-LTE: A Class of 1-X Pooling Convolutional Neural Networks on Label Tree Embeddings for Audio Scene Classification

Huy Phan, Philipp Koch, Lars Hertel, Marco Maass, Radoslaw Mazur, Alfred Mertins

Abstract

We present in this work an approach for audio scene classification. Firstly, given the label set of the scenes, a label tree is automatically constructed where the labels are grouped into meta-classes. This category taxonomy is then used in the feature extraction step in which an audio scene instance is transformed into a label tree embedding image. Elements of the image indicate the likelihoods that the scene instances belong to different meta-classes. A class of simple 1-X (i.e. 1-max, 1-mean, and 1-mix) pooling convolutional neural networks, which are tailored for the task at hand, are finally learned on top of the image features for scene recognition. Experimental results on the DCASE 2013 and DCASE 2016 datasets demonstrate the efficiency of the proposed method.
OriginalspracheEnglisch
Titel2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP)
Seitenumfang5
ErscheinungsortNew Orleans
Herausgeber (Verlag)IEEE
Erscheinungsdatum01.03.2017
Seiten136-140
ISBN (Print)978-1-5386-2220-9
ISBN (elektronisch)978-1-5386-2219-3
DOIs
PublikationsstatusVeröffentlicht - 01.03.2017
Veranstaltung2017 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP) - New Orleans, USA / Vereinigte Staaten
Dauer: 05.03.201709.03.2017

Fingerprint

Untersuchen Sie die Forschungsthemen von „CNN-LTE: A Class of 1-X Pooling Convolutional Neural Networks on Label Tree Embeddings for Audio Scene Classification“. Zusammen bilden sie einen einzigartigen Fingerprint.

Zitieren