JOURNAL ARTICLE

Dynamic Sign Language Recognition in Bahasa using MediaPipe, Long Short-Term Memory, and Convolutional Neural Network

Ivana Valentina LemmuelaMewati AyubOscar Karnalim

Year: 2025 Journal:   Journal of Information Systems Engineering and Business Intelligence Vol: 11 (1)Pages: 17-29   Publisher: Airlangga University

Abstract

Background: Communication is important for everyone, including individuals with hearing and speech impairments. For this demographic, sign language is widely used as the primary medium of communication with others who share similar conditions or with hearing individuals who understand sign language. However, communication difficulties arise when individuals with these impairments attempt to interact with those who do not understand sign language. Objective: This research aims to develop models capable of recognizing sign language movements in Bahasa and converting the detected gesture into corresponding words, with a focus on vocabularies related to religious activities. Specifically, the research examined dynamic sign language in Bahasa, which comprised gestures requiring motion for proper demonstration. Methods: In accordance with the research objective, sign language recognition model was developed using MediaPipe-assisted extraction process. Recognition of dynamic sign language in Bahasa was achieved through the application of Convolutional Neural Networks (CNN) and Long Short-Term Memory (LSTM) methods. Results: Sign language recognition model developed using bidirectional LSTM showed the best result with a testing accuracy of 100%. However, the best result for the CNN alone was 86.67 %. The integration of CNN and LSTM was observed to improve performance than CNN alone, with the best CNN-LSTM model achieving an accuracy of 95.24%. Conclusion: The bidirectional LSTM model outperformed the unidirectional LSTM by capturing richer temporal information, with a specific consideration of both past and future time steps. Based on the observations made, CNN alone could not match the effectiveness of the Bidirectional LSTM, but a combination of CNN with LSTM produced better results. It is also important to state that normalized landmark data was found to significantly improve accuracy. Accuracy within this context was also influenced by shot type variability and specific landmark coordinates. Furthermore, the dataset containing straight-shot videos with x and y coordinates provided more accurate results, dissimilar to those comprised of videos with shot variation, which typically require x, y, and z coordinates for optimal accuracy. Keywords: Convolutional Neural Network (CNN), Long Short-Term Memory (LSTM), MediaPipe, Sign Language

Keywords:
Term (time) Sign (mathematics) Convolutional neural network Speech recognition Computer science Artificial neural network Pattern recognition (psychology) Long short term memory Artificial intelligence Short-term memory Mathematics Recurrent neural network Cognition Psychology

Metrics

1
Cited By
6.16
FWCI (Field Weighted Citation Impact)
24
Refs
0.85
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Citation History

Topics

Hand Gesture Recognition Systems
Physical Sciences →  Computer Science →  Human-Computer Interaction
Edcuational Technology Systems
Physical Sciences →  Computer Science →  Artificial Intelligence
English Language Learning and Teaching
Physical Sciences →  Computer Science →  Information Systems
© 2026 ScienceGate Book Chapters — All rights reserved.