Deep Metric Learning-Assisted 3D Audio-Visual Speaker Tracking via Two-Layer Particle Filter

Yidi Li; Hong Liu; Bing Yang; Runwei Ding; Yang Chen

doi:10.1155/2020/3764309

ScienceGate Book Chapters

JOURNAL ARTICLE

Deep Metric Learning-Assisted 3D Audio-Visual Speaker Tracking via Two-Layer Particle Filter

Yidi Li Hong Liu Bing Yang Runwei Ding Yang Chen

Year: 2020 Journal: Complexity Vol: 2020 Pages: 1-8 Publisher: Hindawi Publishing Corporation

DOI: 10.1155/2020/3764309

Get Full-Text PDF Get Analytical Report

Abstract

For speaker tracking, integrating multimodal information from audio and video provides an effective and promising solution. The current challenges are focused on the construction of a stable observation model. To this end, we propose a 3D audio-visual speaker tracker assisted by deep metric learning on the two-layer particle filter framework. Firstly, the audio-guided motion model is applied to generate candidate samples in the hierarchical structure consisting of an audio layer and a visual layer. Then, a stable observation model is proposed with a designed Siamese network, which provides the similarity-based likelihood to calculate particle weights. The speaker position is estimated using an optimal particle set, which integrates the decisions from audio particles and visual particles. Finally, the long short-term mechanism-based template update strategy is adopted to prevent drift during tracking. Experimental results demonstrate that the proposed method outperforms the single-modal trackers and comparison methods. Efficient and robust tracking is achieved both in 3D space and on image plane.

Keywords:

Particle filter Computer science Metric (unit) Artificial intelligence Eye tracking BitTorrent tracker Tracking (education) Audio visual Layer (electronics) Similarity (geometry) Filter (signal processing) Computer vision Set (abstract data type) Pattern recognition (psychology) Speech recognition Image (mathematics)

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.10

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Video Surveillance and Tracking Methods

Physical Sciences → Computer Science → Computer Vision and Pattern Recognition

Indoor and Outdoor Localization Technologies

Physical Sciences → Engineering → Electrical and Electronic Engineering

Deep Metric Learning-Assisted 3D Audio-Visual Speaker Tracking via Two-Layer Particle Filter

Abstract

Metrics

Topics

Related Documents

Deep Metric Learning-Assisted 3D Audio-Visual Speaker Tracking via Two-Layer Particle Filter

Deep Metric Learning-Assisted 3D Audio-Visual Speaker Tracking via Two-Layer Particle Filter

3D Audio-Visual Speaker Tracking with A Two-Layer Particle Filter

A joint particle filter for audio-visual speaker tracking

A joint particle filter for audio-visual speaker tracking