Comparison of RNN, LSTM and GRU on Speech Recognition Data

Shewalkar, Apeksha Nagesh2018-12-182018-12-182018https://hdl.handle.net/10365/29111Deep Learning [DL] provides an efficient way to train Deep Neural Networks [DNN]. DDNs when used for end-to-end Automatic Speech Recognition [ASR] tasks, could produce more accurate results compared to traditional ASR. Normal feedforward neural networks are not suitable for speech data as they cannot persist past information. Whereas Recurrent Neural Networks [RNNs] can persist past information and handle temporal dependencies. For this project, three recurrent networks, standard RNN, Long Short-Term Memory [LSTM] networks and Gated Recurrent Unit [GRU] networks are evaluated in order to compare their performance on speech data. The data set used for the experiments is a reduced version of TED-LIUM speech data. According to the experiments and their evaluation, LSTM performed best among all other networks with a good word error rate at the same time GRU also achieved results close to those of LSTM in less time.NDSU Policy 190.6.2https://www.ndsu.edu/fileadmin/policy/190.pdfRecurrent neural networks.Long short-term memory networks.Gated recurrent unit networks.Speech recognition.Deep learning.Deep neural networks.TED-LIUM speech data.Neural networks (Computer science)Machine learning.Automatic speech recognition.Natural language processing (Computer science)Comparison of RNN, LSTM and GRU on Speech Recognition DataMaster's paper