Comparison of RNN, LSTM and GRU on Speech Recognition Data

No Thumbnail Available

Date

2018

Journal Title

Journal ISSN

Volume Title

Publisher

North Dakota State University

Abstract

Deep Learning [DL] provides an efficient way to train Deep Neural Networks [DNN]. DDNs when used for end-to-end Automatic Speech Recognition [ASR] tasks, could produce more accurate results compared to traditional ASR. Normal feedforward neural networks are not suitable for speech data as they cannot persist past information. Whereas Recurrent Neural Networks [RNNs] can persist past information and handle temporal dependencies. For this project, three recurrent networks, standard RNN, Long Short-Term Memory [LSTM] networks and Gated Recurrent Unit [GRU] networks are evaluated in order to compare their performance on speech data. The data set used for the experiments is a reduced version of TED-LIUM speech data. According to the experiments and their evaluation, LSTM performed best among all other networks with a good word error rate at the same time GRU also achieved results close to those of LSTM in less time.

Description

Keywords

Recurrent neural networks., Long short-term memory networks., Gated recurrent unit networks., Speech recognition., Deep learning., Deep neural networks., TED-LIUM speech data.

Citation