Transformer Based Punctuation Restoration for Turkish

Uygar Kurt; Aykut Çayır

doi:10.1109/ubmk59864.2023.10286690

ScienceGate Book Chapters

JOURNAL ARTICLE

Transformer Based Punctuation Restoration for Turkish

Uygar Kurt Aykut Çayır

Year: 2023 Pages: 169-174

DOI: 10.1109/ubmk59864.2023.10286690

Get Full-Text PDF Get Analytical Report

Abstract

Mobile devices and social media platforms make communication faster than humans have had before, thanks to the technologies such as automatic speech recognition(ASR). However, the speed in text-based communication methods leads to several mistakes that could be solved. The two well-known mistakes are grammatical errors and forgotten punctuation usage. The punctuation restoration task is inherited from the automatic speech recognition domain. Understanding and restoring correct places of punctuation are challenging problems in speech recognition. However, no datasets exist to train a punctuation restoration model for the Turkish language. This paper focuses on restoring punctuations in Turkish texts and introduces a new Turkish dataset for punctuation restoration. Three transformer models: BERT, ELECTRA, and ConvBERT, are fine-tuned and tested on the newly created dataset for three distinct labels: PERIOD, COMMA, and QUESTION MARK. Benchmark results in the paper are reported regarding precision, recall, and F1 score due to imbalanced class distribution. Although each model shows similar performance scores, ELECTRA reaches 83.9% F1 score overall.

Keywords:

Punctuation Computer science Turkish Transformer Speech recognition Benchmark (surveying) Natural language processing Artificial intelligence Language model Task (project management) Linguistics Engineering Cartography

Metrics

Cited By

0.26

FWCI (Field Weighted Citation Impact)

Refs

0.58

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Transformer Based Punctuation Restoration for Turkish

Abstract

Metrics

Citation History

Topics

Related Documents

Chinese Punctuation Restoration Based on Transformer Model

Transformer based Lightweight Model for Punctuation Restoration and Truecasing

Transformer-Based Punctuation Restoration for Automatic Speech Recognition Systems

Transformer-Based Punctuation Restoration Models for Indonesian with English Codeswitching Speech Transcripts

Punctuation Restoration with Transformer Model on Social Media Data