In this paper, we describe a new speaker change detection algorithm designed for fast transcription and audio indexing of spoken broadcast news. We have designed a two-stage algorithm that begins with a gender-independent phone-class recognition pass. We collapse the phoneme inventory to only 4 broad classes and include 4 different models for non-speech, resulting in a small fast decoder that runs in less than 0.1 times real-time. The second stage of the SCD algorithm hypothesizes a speaker change boundary between every phone in the labeled input. The phone level time resolution in our approach permits the algorithm to run quickly while maintaining the same accuracy as a frame level approach. Applying the new algorithms to a large sample of broadcast news programs resulted in improvements in speaker change detection accuracy, speech recognition accuracy, and speed.
Hansen, Lars KaiJørgensen, KasperMølgaard, Lasse
Leda SarıSamuel ThomasMark Hasegawa‐JohnsonMichael Picheny
Zhipeng ZhangSadaoki FuruiKatsutoshi Ohtsuki