Cognate Production Using Character-Based Neural Machine Translation Without Segmentation

Tsolmon Zundui; Khuyagbaatar Batsuren; Tsendsuren Munkhdalai; Amarsanaa Ganbold

doi:10.1109/access.2025.3543652

ScienceGate Book Chapters

JOURNAL ARTICLE

Cognate Production Using Character-Based Neural Machine Translation Without Segmentation

Tsolmon Zundui Khuyagbaatar Batsuren Tsendsuren Munkhdalai Amarsanaa Ganbold

Year: 2025 Journal: IEEE Access Vol: 13 Pages: 34824-34830 Publisher: Institute of Electrical and Electronics Engineers

DOI: 10.1109/access.2025.3543652

Get Full-Text PDF Get Analytical Report

Abstract

Cognates are words that share a common origin or have been borrowed across languages, often exhibiting similarities in both sound and meaning. In this work, we introduce a fully character-level neural sequence-to-sequence model for cognate production that does not require any segmentation. Our model operates at the character-level to transform a source word into its corresponding cognate in the target language, thereby obviating out-of-vocabulary issues and alleviating the need for subword segmentation. We evaluated our approach on a novel dataset and found that it outperforms both statistical machine translation baselines and prior neural methods on the same training dataset, as measured by standard coverage and mean reciprocal rank metrics. These results underscore the effectiveness of character-level sequence-to-sequence architectures for cognate generation in diverse language settings, including cross-alphabetic transformations.

Keywords:

Computer science Character (mathematics) Translation (biology) Artificial intelligence Machine translation Cognate Segmentation Artificial neural network Production (economics) Natural language processing Speech recognition Pattern recognition (psychology) Mathematics Linguistics

Metrics

Cited By

4.82

FWCI (Field Weighted Citation Impact)

Refs

0.91

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Cognate Production Using Character-Based Neural Machine Translation Without Segmentation

Abstract

Metrics

Citation History

Topics

Related Documents

Cognate Production using Character-based Machine Translation

Fully Character-Level Neural Machine Translation without Explicit Segmentation

Cognate-aware morphological segmentation for multilingual neural translation

Improving Neural Machine Translation Using Rule-Based Machine Translation

Character-Cluster-Based Segmentation using Monolingual and Bilingual Information for Statistical Machine Translation