Rajesh Aggarwal

National Institute of Technology Kurukshetra India

1chapters authored

Chapters authored

Convolutional Neural Networks for Raw Speech Recognition

By Vishal Passricha and Rajesh Kumar Aggarwal

State-of-the-art automatic speech recognition (ASR) systems map the speech signal into its corresponding text. Traditional ASR systems are based on Gaussian mixture model. The emergence of deep learning drastically improved the recognition rate of ASR systems. Such systems are replacing traditional ASR systems. These systems can also be trained in end-to-end manner. End-to-end ASR systems are gaining much popularity due to simplified model-building process and abilities to directly map speech into the text without any predefined alignments. Three major types of end-to-end architectures for ASR are attention-based methods, connectionist temporal classification, and convolutional neural network (CNN)-based direct raw speech model. In this chapter, CNN-based acoustic model for raw speech signal is discussed. It establishes the relation between raw speech signal and phones in a data-driven manner. Relevant features and classifier both are jointly learned from the raw speech. Raw speech is processed by first convolutional layer to learn the feature representation. The output of first convolutional layer, that is, intermediate representation, is more discriminative and further processed by rest convolutional layers. This system uses only few parameters and performs better than traditional cepstral feature-based systems. The performance of the system is evaluated for TIMIT and claimed similar performance as MFCC.

Part of the book: From Natural to Artificial Intelligence

Rajesh Aggarwal

Chapters authored

Related collaborators

Hector Perez Meana

Jesus Olivares-Mercado

Karina Toscano-Medina

Gabriel Sanchez

Mariko Nakano Miyatake

Ernest Redondo

Vikram Singh

Jay Patel

Chih-Wei Lin

Luis Carlos Castro Madrid