A Real Time Speech to Text Conversion Technique for Bengali Language

Abdullah Umar Nasib; Humayun R.H. Kabir; Ruhan Ahmed; Jia Uddin

doi:10.1109/ic4me2.2018.8465680

ScienceGate Book Chapters

JOURNAL ARTICLE

A Real Time Speech to Text Conversion Technique for Bengali Language

Abdullah Umar Nasib Humayun R.H. Kabir Ruhan Ahmed Jia Uddin

Year: 2018 Journal: 2018 International Conference on Computer, Communication, Chemical, Material and Electronic Engineering (IC4ME2) Pages: 1-4

DOI: 10.1109/ic4me2.2018.8465680

Get Full-Text PDF Get Analytical Report

Abstract

This paper presents a model to convert natural Bengali language to text. The proposed model requires the usage of the open sourced framework Sphinx 4 which is written in Java and provides the required procedural coding tools to develop an acoustic model for a custom language like Bengali. Our main objective was to ensure that the system was adequately trained on a word by word basis from various speakers so that it could recognize new speakers fluently. We used a free digital audio workstation (DAW) called Audacity to manipulate the collected recording data via continuous frequency profiling techniques to reduce the Signal-to-Noise-Ratio (SNR), vocal leveling, normalization and syllable splitting as well as merging which ensure an error free 1:1-word mapping of each utterance with its mirror transcription file text. To evaluate the performance of proposed model, we utilize an audio dataset of recorded speech data from 10 individual speakers consisting of both males and females using custom transcript files that we wrote. Experimental results demonstrate that the proposed model exhibits average 71.7% accuracy for our tested dataset.

Keywords:

Computer science Bengali Speech recognition Language model Natural language processing Utterance Artificial intelligence Acoustic model Java Transcription (linguistics) Word (group theory) Natural language Normalization (sociology) Syllable Speech corpus Speech synthesis Speech processing Programming language Linguistics

Metrics

Cited By

0.96

FWCI (Field Weighted Citation Impact)

Refs

0.80

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Speech and Audio Processing

Physical Sciences → Computer Science → Signal Processing

A Real Time Speech to Text Conversion Technique for Bengali Language

Abstract

Metrics

Citation History

Topics

Related Documents

Smart Language Translator: Real-Time Speech and Text Conversion

Real-Time Hand Sign Language Translation: Text and Speech Conversion

Real-time Conversion of Sign Language to Text and Speech

Real Time Hand Sign Language Translation: Text and Speech Conversion

Real-time text-to-speech conversion system