Abstract

Multimedia Event Detection (MED) is an annual task in the NIST TRECVID evaluation, and requires participants to build indexing and retrieval systems for locating videos in which certain predefined events are shown. Typical systems focus heavily on the use of visual data. Audio data, however, also contains rich information that can be effectively used for video retrieval, and MED could benefit from the attention of researchers in audio analysis. We present several systems for performing MED using only audio data, report the results of each system on the TRECVID MED 2011 development dataset, and compare the strengths and weaknesses of each approach.

Keywords:
Computer science NIST Search engine indexing Focus (optics) Event (particle physics) Task (project management) Strengths and weaknesses Information retrieval Audio analyzer Video retrieval Multimedia Audio signal processing Speech recognition Audio signal Speech coding

Metrics

59
Cited By
3.66
FWCI (Field Weighted Citation Impact)
8
Refs
0.95
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Speech and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.