JOURNAL ARTICLE

Transformer Based Punctuation Restoration for Turkish

Abstract

Mobile devices and social media platforms make communication faster than humans have had before, thanks to the technologies such as automatic speech recognition(ASR). However, the speed in text-based communication methods leads to several mistakes that could be solved. The two well-known mistakes are grammatical errors and forgotten punctuation usage. The punctuation restoration task is inherited from the automatic speech recognition domain. Understanding and restoring correct places of punctuation are challenging problems in speech recognition. However, no datasets exist to train a punctuation restoration model for the Turkish language. This paper focuses on restoring punctuations in Turkish texts and introduces a new Turkish dataset for punctuation restoration. Three transformer models: BERT, ELECTRA, and ConvBERT, are fine-tuned and tested on the newly created dataset for three distinct labels: PERIOD, COMMA, and QUESTION MARK. Benchmark results in the paper are reported regarding precision, recall, and F1 score due to imbalanced class distribution. Although each model shows similar performance scores, ELECTRA reaches 83.9% F1 score overall.

Keywords:
Punctuation Computer science Turkish Transformer Speech recognition Benchmark (surveying) Natural language processing Artificial intelligence Language model Task (project management) Linguistics Engineering Cartography

Metrics

1
Cited By
0.26
FWCI (Field Weighted Citation Impact)
42
Refs
0.58
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Natural Language Processing Techniques
Physical Sciences →  Computer Science →  Artificial Intelligence
Speech Recognition and Synthesis
Physical Sciences →  Computer Science →  Artificial Intelligence
Topic Modeling
Physical Sciences →  Computer Science →  Artificial Intelligence
© 2026 ScienceGate Book Chapters — All rights reserved.