JOURNAL ARTICLE

RoBERTa: language modelling in building Indonesian question-answering systems

Wiwin SuwarningsihRaka Aditya PramataFadhil Yusuf RahadikaMochamad Havid Albar Purnomo

Year: 2022 Journal:   TELKOMNIKA (Telecommunication Computing Electronics and Control) Vol: 20 (6)Pages: 1248-1248   Publisher: Ahmad Dahlan University

Abstract

This research aimed to evaluate the performance of the A Lite BERT (ALBERT), efficiently learning an encoder that classifies token replacements accurately (ELECTRA) and a robust optimized BERT pretraining approach (RoBERTa) models to support the development of the Indonesian language question and answer system model. The evaluation carried out used Indonesian, Malay and Esperanto. Here, Esperanto was used as a comparison of Indonesian because it is international, which does not belong to any person or country and this then make it neutral. Compared to other foreign languages, the structure and construction of Esperanto is relatively simple. The dataset used was the result of crawling Wikipedia for Indonesian and Open Super-large Crawled ALMAnaCH coRpus (OSCAR) for Esperanto. The size of the token dictionary used in the test used approximately 30,000 sub tokens in both the SentencePiece and byte-level byte pair encoding methods (ByteLevelBPE). The test was carried out with the learning rates of 1e-5 and 5e-5 for both languages in accordance with the reference from the bidirectional encoder representations from transformers (BERT) paper. As shown in the final result of this study, the ALBERT and RoBERTa models in Esperanto showed the results of the loss calculation that were not much different. This showed that the RoBERTa model was better to implement an Indonesian question and answer system.

Keywords:
Indonesian Computer science Security token Natural language processing Artificial intelligence Encoder Transformer Byte Question answering Malay Language model Foreign language Linguistics Programming language Philosophy

Metrics

7
Cited By
1.37
FWCI (Field Weighted Citation Impact)
31
Refs
0.79
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Edcuational Technology Systems
Physical Sciences →  Computer Science →  Artificial Intelligence

Related Documents

JOURNAL ARTICLE

Question answering using statistical language modelling

Matthias H. HeieEdward W. D. WhittakerSadaoki Furui

Journal:   Computer Speech & Language Year: 2011 Vol: 26 (3)Pages: 193-209
JOURNAL ARTICLE

Natural language question-answering systems: 1969

Robert F. Simmons

Journal:   Communications of the ACM Year: 1970 Vol: 13 (1)Pages: 15-30
DISSERTATION

Building robust and modular question answering systems

Chen, Jifan (Ph. D. in Computer Science)

University:   Texas Digital Library (University of Texas) Year: 2023
© 2026 ScienceGate Book Chapters — All rights reserved.