JOURNAL ARTICLE

Audio Representation Learning with Deep Neural Networks

Mohammad Rasool Izadi

Year: 2023 Journal:   OPAL (Open@LaTrobe) (La Trobe University)   Publisher: La Trobe University

Abstract

In this dissertation, we examined three sequence-to-sequence representation challenges: source detection and separation, sound event detection, and disentanglement. For each challenge, we introduced distinct models and assessed their performance by conducting experiments on several datasets and by comparing these results with those of other established models. Our study spanned areas such as deep learning and representation learning, and it touched on bioacoustics, urban sounds, singing voices, and speech across a range of specific tasks.
First, we developed a source segmentation model to isolate an undetermined number of bat echolocation calls from mixed sounds. This design used two interconnected models working together. The primary model identified potential sources, while the subsequent model isolated individual sources within the time-frequency domain. Next, inspired by attention and graph neural networks, we presented a method to include time-level similarities throughout the time-domain. We blended features across various layers with our adaptive affinity mixup technique. This enhancement boosted the event-F1 scores of our sound event detection model by 8.2\% when applied to urban sounds. Finally, we delved into weakly supervised disentanglement using a multi-rate latent space. We put forward a unique framework to represent and produce variable-length sequences through paired samples. Our method incorporates a straightforward swapping mechanism and variational transformers. We provided a theoretical demonstration that swapping can attain optimal disentanglement under weak supervision. Experimental results on singing voices, speech, and images confirm that our technique consistently outperforms other methods. In conclusion, this dissertation offers innovative approaches to sequence-to-sequence representation challenges, emphasizing the blend of cutting-edge techniques and practical applications. Our findings not only advance the current understanding of sound source detection, event detection, and sequential disentanglement but also set a precedent for future research in these areas. The consistent improvements observed across various tasks underscore the potential of our proposed methods in diverse audio domains, hinting at broader applications and further explorations in representation learning.

Keywords:
Representation (politics) Feature learning Segmentation Human echolocation Event (particle physics) Deep learning Deep neural networks

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.40
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Computational Physics and Python Applications
Physical Sciences →  Computer Science →  Artificial Intelligence
Gene expression and cancer classification
Life Sciences →  Biochemistry, Genetics and Molecular Biology →  Molecular Biology
Big Data and Digital Economy
Physical Sciences →  Computer Science →  Information Systems

Related Documents

BOOK-CHAPTER

Feature Representation Learning in Deep Neural Networks

Dong YuLi Deng

Signals and communication technology Year: 2014 Pages: 157-175
JOURNAL ARTICLE

Deep representation-based transfer learning for deep neural networks

Tao YangXia YuNing MaYifu ZhangHongru Li

Journal:   Knowledge-Based Systems Year: 2022 Vol: 253 Pages: 109526-109526
JOURNAL ARTICLE

Neural Audio Coding with Deep Complex Networks

Jiawei RuLizhong WangMaoshen JiaLiang WenHandong WangYuhao ZhaoJing Wang

Journal:   Journal of Physics Conference Series Year: 2024 Vol: 2759 (1)Pages: 012005-012005
JOURNAL ARTICLE

Unsupervised Point Cloud Representation Learning With Deep Neural Networks: A Survey

Aoran XiaoJiaxing HuangDayan GuanXiaoqin ZhangShijian LuLing Shao

Journal:   IEEE Transactions on Pattern Analysis and Machine Intelligence Year: 2023 Vol: 45 (9)Pages: 11321-11339
© 2026 ScienceGate Book Chapters — All rights reserved.