JOURNAL ARTICLE

MULTIMODAL HUMAN-COMPUTER INTERACTION USING HAND, EYE AND VOICE GESTURES

Badanapalli Aparna, Nagaraju Vassey, Manne Naga VJ Manikanth

Year: 2025 Journal:   Zenodo (CERN European Organization for Nuclear Research)   Publisher: European Organization for Nuclear Research

Abstract

This study presents a multimodal human-computer interaction (HCI) system that integrates hand gestures, eyegestures, and voice commands to control computer actions. Utilizing robust computer vision libraries likeMediaPipe, and SpeechRecognition and PyAutoGUI, the system provides a highly customizable experiencewhere in users can train and map their own gestures or voice commands to particular actions like mouse clicks,volume control, application launching, and screen interaction. Implemented in Python using MediaPipe,PyAutoGUI, and the SpeechRecognition library, the solution supports real-time gesture recognition, actionmapping, and system control. This work demonstrates the potential for inclusive, accessible, and intuitiveinterfaces across platforms.

Keywords:
Gesture Multimodal interaction Python (programming language) Gesture recognition Voice command device Multimodality

Metrics

0
Cited By
0.00
FWCI (Field Weighted Citation Impact)
0
Refs
0.47
Citation Normalized Percentile
Is in top 1%
Is in top 10%

Topics

Hand Gesture Recognition Systems
Physical Sciences →  Computer Science →  Human-Computer Interaction
Gaze Tracking and Assistive Technology
Physical Sciences →  Computer Science →  Human-Computer Interaction
Interactive and Immersive Displays
Physical Sciences →  Computer Science →  Human-Computer Interaction
© 2026 ScienceGate Book Chapters — All rights reserved.