JOURNAL ARTICLE

Fine Grained Spoken Document Summarization Through Text Segmentation

Samantha KoteyRozenn DahyotNaomi Harte

Year: 2023 Journal:   2022 IEEE Spoken Language Technology Workshop (SLT) Vol: 33 Pages: 647-654

Abstract

Podcast transcripts are long spoken documents of conversational dialogue. Challenging to summarize, podcasts cover a diverse range of topics, vary in length, and have uniquely different linguistic styles. Previous studies in podcast summarization have generated short, concise dialogue summaries. In contrast, we propose a method to generate long fine-grained summaries, which describe details of sub-topic narratives. Leveraging a readability formula, we curate a data subset to train a long sequence transformer for abstractive summarization. Through text segmentation, we filter the evaluation data and exclude specific segments of text. We apply the model to segmented data, producing different types of fine grained summaries. We show that appropriate filtering creates comparable results on ROUGE and serves as an alternative method to truncation. Experiments show our model outperforms previous studies on the Spotify podcast dataset when tasked with generating longer sequences of text.

Keywords:
Automatic summarization Computer science Readability Natural language processing Transformer Artificial intelligence Segmentation Plain text Information retrieval Filter (signal processing)

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
40
Refs
0.03
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Text Readability and Simplification
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

© 2026 ScienceGate Book Chapters — All rights reserved.