JOURNAL ARTICLE

Multimodal Urban Scene Understanding

Rajsuryan Singh

Year: 2022 Journal:   Zenodo (CERN European Organization for Nuclear Research)   Publisher: European Organization for Nuclear Research

Abstract

Early computational approaches for sound source localization, originating in robotics, were modeled after animal perception and utilized audiovisual synchrony and spatial information inferred from multichannel audio. More recent deep learning-based
methods focus on learning semantic audiovisual representations in a self-supervised manner and using them for localizing sounding objects. A majority of these approaches by design exclude information that comes from the temporal context that a video provides. While that is not a hurdle for widely used benchmark datasets because of the bias towards having large single objects in the middle of the image, the methods fall short on more challenging scenarios like urban traffic videos. This thesis aims to explore methods to introduce temporal context into the state-of-the-art methods for sound source localization in urban scenes. Optical flow is used as a means to encode motion information. An analysis of the strengths and weaknesses of our methods helps us better understand the problem of visual sound source localization and sheds new light on the characteristics of our dataset.

Keywords:
Context (archaeology) Focus (optics) Perception Strengths and weaknesses Benchmark (surveying) ENCODE Motion (physics) Spatial contextual awareness

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.29
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Generative Adversarial Networks and Image Synthesis
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

DISSERTATION

Multimodal Urban Scene Understanding

Rajsuryan Singh

University:   Zenodo (CERN European Organization for Nuclear Research) Year: 2022
JOURNAL ARTICLE

Multimodal information fusion for urban scene understanding

Philippe XuFranck DavoineJean-Baptiste BordesHuijing ZhaoThierry Denœux

Journal:   Machine Vision and Applications Year: 2014 Vol: 27 (3)Pages: 331-349
BOOK

Multimodal Scene Understanding

Elsevier eBooks Year: 2019
JOURNAL ARTICLE

Multimodal Computational Attention for Scene Understanding

Boris Schauerte

Journal:   Repository KITopen (Karlsruhe Institute of Technology) Year: 2014
© 2026 ScienceGate Book Chapters — All rights reserved.