JOURNAL ARTICLE

Word Topical Mixture Models for Extractive Spoken Document Summarization

Abstract

This paper considers extractive summarization of Chinese spoken documents. In contrast to conventional approaches, we attempt to deal with the extractive summarization problem under a probabilistic generative framework. A word topical mixture model (w-TMM) was proposed to explore the cooccurrence relationship between words of the language. Each sentence of the spoken document to be summarized was treated as a composite word TMM model for generating the document, and sentences were ranked and selected according to their likelihoods. Various kinds of modeling structures and learning approaches were extensively investigated. In addition, the summarization capabilities were verified by comparison with the other conventional summarization approaches. The experiments were performed on the Chinese broadcast news collected in Taiwan. Noticeable performance gains were obtained. The proposed summarization technique has also been properly integrated into our prototype system for voice retrieval of broadcast news via mobile devices.

Keywords:
Automatic summarization Computer science Natural language processing Artificial intelligence Sentence Word (group theory) Multi-document summarization Language model Generative grammar Information retrieval Speech recognition Linguistics

Metrics

6
Cited By
1.94
FWCI (Field Weighted Citation Impact)
19
Refs
0.87
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Extractive spoken document summarization for information retrieval

Berlin ChenYi-Ting Chen

Journal:   Pattern Recognition Letters Year: 2007 Vol: 29 (4)Pages: 426-437
BOOK-CHAPTER

Topical Extractive Summarization

Kristina ZheltovaAnastasia IaninaValentin Malykh

Communications in computer and information science Year: 2022 Pages: 14-26
© 2026 ScienceGate Book Chapters — All rights reserved.