Journal of Cardiovascular Disease Research
Developing a CNN-Based Model for Sign Language Recognition and Translation to Text and Speech
V.Arun Kumar, A. Swathi, B. Sneha, B. Pranika, Ch. Meghana
JCDR. 2023: 633-645
Abstract
This work develops a continuous sign language (SL) recognition framework with deep neural networks, which directly transcribes videos of SL sentences to sequences of ordered gloss labels. Previous methods dealing with continuous SL recognition usually employ hidden Markov models with limited capacity to capture the temporal information. In contrast, our proposed architecture adopts deep convolutional neural networks with stacked temporal fusion layers as the feature extraction module, and bi-directional recurrent neural networks as the sequence learning module. We propose an iterative optimization process for our architecture to fully exploit the representation capability of deep neural networks with limited data. Our proposed neural model consists of two modules for spatiotemporal feature extraction and sequence learning, respectively. Due to the limited scale of the datasets, we find an end-to-end training cannot fully exploit the deep neural network of high complexity. To address this problem, this work investigates an iterative optimization process to train our convolutional neural network based bidirectional long-short-term-memory (CNN-BILSTM) architecture effectively. We use gloss-level gestural supervision given by forced alignment from end-to-end system to directly guide the training process of the feature extractor. Afterwards, this work fine-tunes the BILSTM system with the improved feature extractor, and the system can provide further refined alignment for the feature extraction module. Through this iterative training strategy, the proposed CNN-BILSTM can keep learning and benefiting from the refined gestural alignments. To implement this project, ‘SignumDataset’ dataset is used, which contains 24 different signs or signatures.
» PDF