BOOK-CHAPTER

Audio-Visual and Visual-Only Speech and Speaker Recognition

Abstract

The information imbedded in the visual dynamics of speech has the potential to improve the performance of speech and speaker recognition systems. The information carried in the visual speech signal compliments the information in the acoustic speech signal, which is particularly beneficial in adverse acoustic environments. Non-invasive methods using low-cost sensors can be used to obtain acoustic and visual biometric signals, such as a person’s voice and lip movement, with little user cooperation. These types of unobtrusive biometric systems are warranted to promote widespread adoption of biometric technology in today’s society. In this chapter, the authors describe the main components and theory of audio-visual and visual-only speech and speaker recognition systems. Audio-visual corpora are described and a number of speech and speaker recognition systems are reviewed. Finally, various open issues about the system design and implementation, and present future research and development directions in this area are discussed.

Keywords:
Speech recognition Computer science Biometrics Audio visual Speaker recognition SIGNAL (programming language) Human–computer interaction Artificial intelligence Multimedia

Metrics

7
Cited By
0.89
FWCI (Field Weighted Citation Impact)
0
Refs
0.76
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Digital Media Forensic Detection
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
© 2026 ScienceGate Book Chapters — All rights reserved.