Fairuz Iqbal MaulanaYaya HeryadiGede Putra KusumaWidodo Budiharto
INMAD is a dataset containing a corpus of English-Indonesian-Madurese translated sentences. This corpus stores a list of 23086 lines of sentences, as well as their translations in Indonesian and English. The details of each Madurese translation cover 1 language level, namely the 'engghi-enten' level. The framework for creating the dataset consists of two stages. First, the Combine source of parallel corpus to create and improve the quality of sentences corpus. Second, Data Augmentation with Back-translation using MarianMT and combine parallel dataset with original parallel corpus. INMAD received validation from a Madurese language specialist, who also served as the translator for the source of this dataset. Consequently, this dataset can serve as the primary resource for Natural Language Processing (NLP) research, particularly for Madurese at the 'engghi-enten' level.
Zijian LiChengying ChiYunyun Zhan
Hitoshi ItoNaoto ShiraiKazutaka KinugawaHideya MinoYoshihiko Kawai
Sainik Kumar MahataJyoti GuptaKhusboo KumariMonalisa DeyAnupam MondalDarothi Sarkar