Spoken document retrieval (SDR) has been extensively studied in recent years because of its potential use in navigating large multimedia collections in the near future. This paper presents a novel concept of applying the content-based language models to spoken document retrieval. In an example task for retrieval of Mandarin broadcast news, the content-based language models either trained with the automatic transcriptions of the spoken documents or adapted from the baseline language models using the automatic transcriptions of the spoken documents were used to create the more accurate recognition results and indexing terms from both the spoken documents and the speech queries. We report on some interesting findings obtained in this research.
Paula Lopez‐OteroJavier ParaparÁlvaro Barreiro
Xinhui HuRyosuke IsotaniSatoshi Nakamura
Kuan‐Yu ChenHung-Shin LeeHsin‐Min WangBerlin ChenHsin‐Hsi Chen
Hsiao-Yun LinTien-Hong LoBerlin Chen