Simplified Self-Attention for Transformer-Based end-to-end Speech Recognition

Haoneng Luo; Shiliang Zhang; Ming Lei; Lei Xie

doi:10.1109/slt48900.2021.9383581

ScienceGate Book Chapters

JOURNAL ARTICLE

Simplified Self-Attention for Transformer-Based end-to-end Speech Recognition

Haoneng Luo Shiliang Zhang Ming Lei Lei Xie

Year: 2021 Pages: 75-81

DOI: 10.1109/slt48900.2021.9383581

Get Full-Text PDF Get Analytical Report

Abstract

Transformer models have been introduced into end-to-end speech recognition with state-of-the-art performance on various tasks owing to their superiority in modeling long-term dependencies. However, such improvements are usually obtained through the use of very large neural networks. Transformer models mainly include two submodules - position-wise feedforward layers and self-attention (SAN) layers. In this paper, to reduce the model complexity while maintaining good performance, we propose a simplified self-attention (SSAN) layer which employs FSMN memory blocks instead of projection layers to form query and key vectors for transformer-based end-to-end speech recognition. We evaluate the SSAN-based and the conventional SAN-based transformers on the public AISHELL-1, internal 1000-hour and 20,000-hour large-scale Mandarin tasks. Results show that our proposed SSAN-based transformer model can achieve over 20% reduction in model parameters and 6.7% relative CER reduction on the AISHELL-1 task. With impressively 20% parameter reduction, our model shows no loss of recognition performance on the 20,000-hour large-scale task.

Keywords:

Computer science Transformer End-to-end principle Speech recognition Feed forward Artificial intelligence Pattern recognition (psychology) Engineering Voltage Control engineering

Metrics

Cited By

4.80

FWCI (Field Weighted Citation Impact)

Refs

0.96

Citation Normalized Percentile

Is in top 1%

Is in top 10%

Citation History

Topics

Speech Recognition and Synthesis

Physical Sciences → Computer Science → Artificial Intelligence

Natural Language Processing Techniques

Physical Sciences → Computer Science → Artificial Intelligence

Music and Audio Processing

Physical Sciences → Computer Science → Signal Processing

Simplified Self-Attention for Transformer-Based end-to-end Speech Recognition

Abstract

Metrics

Citation History

Topics

Related Documents

Transformer-Based End-to-End Speech Recognition with Residual Gaussian-Based Self-Attention

Transformer Based End-to-End Speech Recognition with Linear Attention

Transformer-Based Online CTC/Attention End-To-End Speech Recognition Architecture

Transformer-Based End-to-End Speech Recognition with Local Dense Synthesizer Attention

Transformer-Based Long-Context End-to-End Speech Recognition