This paper shows results of recovering punctuation over speech transcriptions for a Portuguese broadcast news corpus. The approach is based on maximum entropy models and uses word, part-of-speech, time and speaker information. The contribution of each type of feature is analyzed individually. Separate results for each focus condition are given, making it possible to analyze the differences of performance between planned and spontaneous speech. Index Terms: rich transcription, punctuation recovery, sentence boundary detection, maximum entropy.
Fernando BatistaDiamantino CaseiroNuno MamedeIsabel Trancoso
Lou-Ann KleppaRenato Miguel Basso
Mehmet Efe YuzugulerC. Okan Sakar
Wenzhu ShenRoger Peng YuFrank SeideJi Wu