JOURNAL ARTICLE

Recovering punctuation marks for automatic speech recognition

Abstract

This paper shows results of recovering punctuation over speech transcriptions for a Portuguese broadcast news corpus. The approach is based on maximum entropy models and uses word, part-of-speech, time and speaker information. The contribution of each type of feature is analyzed individually. Separate results for each focus condition are given, making it possible to analyze the differences of performance between planned and spontaneous speech. Index Terms: rich transcription, punctuation recovery, sentence boundary detection, maximum entropy.

Keywords:
Punctuation Computer science Speech recognition Principle of maximum entropy Sentence Natural language processing Focus (optics) Transcription (linguistics) Artificial intelligence Feature (linguistics) Part of speech Entropy (arrow of time) Portuguese Linguistics

Metrics

30
Cited By
3.88
FWCI (Field Weighted Citation Impact)
5
Refs
0.93
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Speech and dialogue systems
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.