A hybrid CNN-LiGRU acoustic modeling using raw waveform sincnet for Hindi ASR

Ankit Kumar; Rajesh Kumar Aggarwal

doi:10.7494/csci.2020.21.4.3748

Authors

Ankit Kumar National Institute of Technology, Kurukshetra
Rajesh Kumar Aggarwal National Institute of Technology, Kurukshetra

DOI:

https://doi.org/10.7494/csci.2020.21.4.3748

Keywords:

Acoustic Modeling, ASR, RNN, CNN

Abstract

Deep Neural Network (DNN) is currently playing the most vital role in Automatic Speech Recognition (ASR). Convolution Neural Network (CNN) and Recurrent Neural Network (RNN) are the advanced versions of DNN. CNN and RNN are right to deal with spatial and temporal properties of the speech signal, respectively, and both properties have a higher impact on accuracy. In today’s scenario, many acoustic modeling techniques often switches due to the battle of CNNs and RNNs. In the last few years, CNN, with raw speech signal, shows their superiority over precomputed acoustic features. Recently, a novel first convolution layer named SincNet was proposed to produce the interpretable filters with better accuracy. In this work, we proposed a hybrid SincNet-CNN-RNN architecture with low computation cost and high accuracy. Different configurations of the hybrid model were extensively examined to achieve this goal. All experiments were performed on the Hindi speech dataset.

Downloads

Download data is not yet available.

Author Biographies

Ankit Kumar, National Institute of Technology, Kurukshetra

Research Scholar,

Computer Engineering Department,

National Institute of Technology, Kurukshetra
Rajesh Kumar Aggarwal, National Institute of Technology, Kurukshetra

Professor,
Computer Engineering Department,
National Institute of Technology, Kurukshetra

A hybrid CNN-LiGRU acoustic modeling using raw waveform sincnet for Hindi ASR

Authors

DOI:

Keywords:

Abstract

Downloads

Author Biographies

References

Downloads

Published

Issue

Section

License

How to Cite

Latest publications

Information

Make a Submission