Adaptive Distance-Based Pooling in Convolutional Neural Networks for Audio Event Classification

Irene Martín-Morató; Máximo Cobos; Francesc J. Ferri

doi:10.1109/taslp.2020.3001683

ScienceGate Book Chapters

JOURNAL ARTICLE

Adaptive Distance-Based Pooling in Convolutional Neural Networks for Audio Event Classification

Irene Martín-Morató Máximo Cobos Francesc J. Ferri

Year: 2020 Journal: IEEE/ACM Transactions on Audio Speech and Language Processing Pages: 1-1 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/taslp.2020.3001683

Get Full-Text PDF Get Analytical Report

Abstract

In the last years, deep convolutional neural networks have become a standard for the development of state-of-the art audio classification systems, taking the lead over traditional approaches based on feature engineering. While they are capable of achieving human performance under certain scenarios, it has been shown that their accuracy is severely degraded when the systems are tested over noisy or weakly segmented events. Although better generalization could be obtained by increasing the size of the training dataset, e.g. by applying data augmentation techniques, this also leads to longer and more complex training procedures. In this paper, we propose a new type of pooling layer aimed at compensating non-relevant information of audio events by applying an adaptive transformation of the convolutional feature maps in the temporal axis. The proposed layer performs a non-linear temporal transformation that follows a uniform distance subsampling criterion on the learned feature space. The experiments conducted over different datasets show significant performance improvements when the proposed layer is added to baseline models, resulting in systems that generalize better to mismatching test conditions and learn more robustly from weakly labeled data.

Keywords:

Pooling Computer science Convolutional neural network Feature (linguistics) Artificial intelligence Generalization Pattern recognition (psychology) Transformation (genetics) Layer (electronics) Feature engineering Feature vector Event (particle physics) Deep learning Machine learning Mathematics

Metrics

Cited By

1.18

FWCI (Field Weighted Citation Impact)

Refs

0.79

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Music Technology and Sound Studies

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Adaptive Distance-Based Pooling in Convolutional Neural Networks for Audio Event Classification

Abstract

Metrics

Citation History

Topics

Related Documents

Convolutional Neural Network based Audio Event Classification

Term-based pooling in convolutional neural networks for text classification

Event Extraction via Dynamic Multi-Pooling Convolutional Neural Networks

Convolutional Neural Networks for event classification

What Affects the Performance of Convolutional Neural Networks for Audio Event Classification