Fine-Tuning Large Language Models for Kazakh Text Simplification

Alymzhan Toleu; Gulmira Tolegen; Irina Ualiyeva

doi:10.3390/app15158344

ScienceGate Book Chapters

JOURNAL ARTICLE

Fine-Tuning Large Language Models for Kazakh Text Simplification

Alymzhan Toleu Gulmira Tolegen Irina Ualiyeva

Year: 2025 Journal: Applied Sciences Vol: 15 (15)Pages: 8344-8344 Publisher: Multidisciplinary Digital Publishing Institute

DOI: 10.3390/app15158344

Get Full-Text PDF Get Analytical Report

Abstract

This paper addresses text simplification task for Kazakh, a morphologically rich, low-resource language, by introducing KazSim, an instruction-tuned model built on multilingual large language models (LLMs). First, we develop a heuristic pipeline to identify complex Kazakh sentences, manually validating its performance on 400 examples and comparing it against a purely LLM-based selection method; we then use this pipeline to assemble a parallel corpus of 8709 complex–simple pairs via LLM augmentation. For the simplification task, we benchmark KazSim against standard Seq2Seq systems, domain-adapted Kazakh LLMs, and zero-shot instruction-following models. On an automatically constructed test set, KazSim (Llama-3.3-70B) achieves BLEU 33.50, SARI 56.38, and F1 87.56 with a length ratio of 0.98, outperforming all baselines. We also explore prompt language (English vs. Kazakh) and conduct human evaluation with three native speakers: KazSim scores 4.08 for fluency, 4.09 for meaning preservation, and 4.42 for simplicity—significantly above GPT-4o-mini. Error analysis shows that remaining failures cluster into tone change, tense change, and semantic drift, reflecting Kazakh’s agglutinative morphology and flexible syntax.

Keywords:

Kazakh Computer science Natural language processing Linguistics Philosophy

Metrics

Cited By

0.00

FWCI (Field Weighted Citation Impact)

Refs

0.11

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Topics

Text Readability and Simplification

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Topic Modeling

Physical Sciences → Computer Science → Artificial Intelligence

Fine-Tuning Large Language Models for Kazakh Text Simplification

Abstract

Metrics

Topics

Related Documents

Fine-tuning large language models for chemical text mining

Fine-Tuning Large Language Models for Text-to-SQL

Investigating Fine-Tuning of Large Language Models for Text Summarisation

Chinese Text Simplification Based on Large Language Models

Distillation of Large Language Models for Text Simplification