Harman Preet SinghParminder SinghManjot Kaur Gill
In recent years, speech technology gets very advanced, due to which speech synthesis becomes an interesting area of study for researchers. Text-To-Speech (TTS) system generates the speech from the text by using a synthesized technique like concatenative, formant, articulatory, Statistical Parametric Speech Synthesis (SPSS) etc. The Deep Neural Network (DNN) based SPSS for the Punjabi language is used in this research work. The database used for this research works contains 674 audio files and a single text file containing 674 sentences. This database was created at the Language Technologies Institute at Carnegie Mellon University (CMU) provided under Festvox distribution. Ossian toolkit is used as a front-end for text processing. The two DNNs are modeled using the merlin toolkit. The duration DNN maps the linguistic and duration features of speech. The acoustic DNN maps the linguistic and acoustic features. The subjective evaluation using the Mean Opinion Score (MOS) shows that this TTS system has good quality of naturalness that is 80.2%.
Harsimarjeet KaurParminder SinghA JalinJ JayakumariN AdigaB KhonglahS Mahadeva PrasannaE GerbierF ArajoJ FilhoA KlautauD MahantaB SharmaP SarmahS R Mahadeva PrasannaD JurafskyJ MartinS LukoseS UpadhyaG KaurP Singh
Heiga ZenAndrew SeniorMike Schuster
Gurjit KaurKamaldeep KaurParminder Singh