JOURNAL ARTICLE

Fusing Audio-Words with Visual Features for Pornographic Video Detection

Abstract

The traditional approach of filtering pornographic videos on the Internet is based on visual features of keyframes. However, it cannot meet users' needs owing to the proliferation of low-resolution videos. To improve the filtering performance, we propose a novel framework of fusing audio-words with visual features for pornographic video detection. Our intention is not only to fuse the two modalities of visual images and audio signals, but also to narrow down the semantic gap between low-level features and high-level concepts by using the mid-level feature "audio-words". To further improve the performance, we present the segmentation algorithm based on units of energy envelope and the decision algorithm based on periodic patterns. The results show that our approach outperforms the traditional one which is based on visual features and achieves satisfactory performance. Moreover, the proposed segmentation algorithm is better than the conventional one using the same length and the proposed decision algorithm exceeds the conventional one using thresholds.

Keywords:
Computer science Artificial intelligence Segmentation Fuse (electrical) Audio visual Visualization Feature (linguistics) Feature extraction Computer vision Pattern recognition (psychology) Semantic gap Image segmentation Speech recognition Image (mathematics) Multimedia Image retrieval

Metrics

12
Cited By
1.28
FWCI (Field Weighted Citation Impact)
21
Refs
0.82
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Video Analysis and Summarization
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
Music and Audio Processing
Physical Sciences →  Computer Science →  Signal Processing
Advanced Steganography and Watermarking Techniques
Physical Sciences →  Computer Science →  Computer Vision and Pattern Recognition
© 2026 ScienceGate Book Chapters — All rights reserved.