Vikramjit MitraHosung NamCarol Espy-WilsonElliot SaltzmanLouis Goldstein
Previous studies have proposed ways to estimate articulatory information from the acoustic speech signal and have shown that when used with standard cepstral features, they help to improve word recognition performance in noise for a connected digit recognition task. In this paper, I present results from a word recognition and a phone recognition experiments in noise that uses two sets of articulatory representation: continuous (tract variable trajectories) and discrete (articulatory gestures) along with standard mel cepstral features for acoustic modeling. The acoustic model is a dynamic Bayesian network (DBN) that treats the continuous articulatory information as observed and the discrete articulatory presentation as hidden random variables. Our results indicate that the use of articulatory information improved noise robustness for both the word recognition and phone recognition tasks substantially.
Joe FrankelMirjam WesterSimon King
Joe FrankelMirjam WesterSimon King
Vikramjit MitraHosung NamCarol Espy-Wilson
Todd Andrew StephensonHervé BourlardSamy BengioAndrew Morris
Kate SaenkoTrevor DarrellJames Glass