Quốc Bảo NguyễnJonas GehringKevin KilgourAlex Waibel
We investigate several optimizations to a recently published architecture for extracting bottleneck features for large-vocabulary speech recognition with deep neural networks. We are able to improve recognition performance of first-pass systems from a 12% relative word error rate reduction reported previously to 21%, compared to MFCC baselines on a Tagalog conversational telephone speech corpus. This is achieved by using different input features, training the network to predict context-dependent targets, employing an efficient learning rate schedule and varying several architectural details. Evaluations on two larger German and French speech transcription tasks show that the optimizations proposed are universally applicable and yield comparable gains on other corpora (19.9% and 22.8%, respectively).
Nguyen, Quoc BaoGehring, JonasKilgour, KevinWaibel, Alex
Tanima ThakurIsha BatraArun Malik
Tian MaKavuma BenonBamweyana ArnoldKeping YuYang YanQiaozhi HuaZheng WenAnup Kumar Paul