Yipei WangShourabh RawatMetze, Florian
The audio semantic concepts (sound events) play important roles in audio-based content analysis. How to capture the semantic information effectively from the complex occurrence pattern of sound events in YouTube quality videos is a challenging problem. This paper presents a novel framework to handle the complex situation for semantic information extraction in real-world videos and evaluate through the NIST multimedia event detection task (MED). We calculate the occurrence confidence matrix of sound events and explore multiple strategies to generate clip-level semantic features from the matrix. We evaluate the performance using TRECVID2011 MED dataset. The proposed method outperforms previous HMM-based system. The late fusion experiment with the low-level features and text feature (ASR) shows that audio semantic concepts capture complementary information in the soundtrack.
Yipei WangShourabh RawatFlorian Metze
Wang, YipeiRawat, ShourabhMetze, Florian
Jin, QinSchulam, Peter F.Shourabh RawatBurger, SusanneDing, DuoMetze, Florian
Qin JinPeter SchulamShourabh RawatSusanne BurgerDuo DingFlorian Metze
Alexander G. HauptmannMichael G. ChristelRong Yan