JOURNAL ARTICLE

Multi-feature audio-visual person recognition

Abstract

We propose a high-performance low-complexity audio-visual person recognition framework suitable for on-line user authentication for various web-applications which delivers robustness against various types of imposter attacks by capturing face and speech dynamics from the video of the user. Instead of using the traditional frontal-face image, a set of compressed face profile vectors are extracted from multiple poses of the person. Similarly, multiple user-selected passwords are used to create robustness against imposter attacks. A novel FGRAM-CFD speech feature is proposed which captures the identity of the user from the speech dynamics contained in the password. The novel signal processing methods proposed here for speech and face feature-extraction led to high discriminative power of the combined audio-visual features. This allowed the classifier to remain simple, yet delivering a reasonably high performance at significantly low complexity as demonstrated by our trials on a 210-user audio-visual biometric database created for this research.

Keywords:
Computer science Robustness (evolution) Feature extraction Discriminative model Speech recognition Artificial intelligence Password Classifier (UML) Biometrics Facial recognition system Pattern recognition (psychology) Speaker recognition Replay attack

Metrics

2
Cited By
0.34
FWCI (Field Weighted Citation Impact)
27
Refs
0.64
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Face recognition and analysis
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Digital Media Forensic Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition

Related Documents

JOURNAL ARTICLE

ROBUST MULTIMODAL PERSON RECOGNITION USING LOW-COMPLEXITY AUDIO-VISUAL FEATURE FUSION APPROACHES

Dhaval N. ShahKyu J. HanShrikanth Narayanan

Journal:   International Journal of Semantic Computing Year: 2010 Vol: 04 (02)Pages: 155-179
JOURNAL ARTICLE

Multimodal person recognition in audio-visual streams

Do Le

Journal:   Infoscience (Ecole Polytechnique Fédérale de Lausanne) Year: 2019
JOURNAL ARTICLE

Visual feature analysis for audio-visual speech recognition

Ivana Arsic

Journal:   Infoscience (Ecole Polytechnique Fédérale de Lausanne) Year: 2008
© 2026 ScienceGate Book Chapters — All rights reserved.