Classifying handwritten characters of Devanagari script

Mayank Mishra
Tanmay Sarkar
Tanupriya Choudhury

Project Overview


Devanagari is a Northern Brahmic script related to many other South Asian scripts, including Gujarati, Bengali, and Gurmukhi, and, more distantly, to a number of South-East Asian scripts including Thai, Balinese, and Baybayin. More than 100 languages spoken in India and Nepal, including Sanskrit, Hindi, Nepali, Marathi, Bhojpuri, Maithili, etc., are written in the Devanagari script. However, advancements in building an Optical Character Recognition (OCR) for the Devanagari script have been limited compared to other Latin scripts. The recent advances in deep learning have opened opportunities to create an efficient character recognition system for the Devanagari script. In this project, we have exploited the strength of a deep neural network with ResNet architecture. Our ResNet holds eighty-five convolution layers and records an accuracy higher than the previous works on the Devanagari Handwritten Character Dataset (DHCD).

Challenges


There are several challenges to building a comprehensive character recognition for the Devanagari script.

  • High complexity: The Devanagari script is host to complex formation and structures. Recognizing the intricate composition of characters is a difficult task for the recognition system.

  • A high degree of resemblance between characters: Many characters written in Devanagri script are distinguished using minute differences. Characters like ma and bha are differentiated using a small curve below the head stroke (Shirorekha), ta and ddh are recognized using a small curve in the bottom, and kna and da are similar except for a small dot near the character.

  • Inconsistent use: People from widespread locations have been using languages in the Devanagari script. Over time, the writing of the script has undergone slight modifications relating to a particular place, era, cultural influence, etc. Such changes have resulted in the creation of several variants of scriptwriting which makes it challenging to build a generalized recognition model.

Dataset


We have used the Devanagari Handwritten Character Dataset (DHCD) to train and test our image classification model. This dataset was published in Deep Learning Based Large Scale Handwritten Devanagari Character Recognition and has been used as a standard dataset for Devanagari character recognition. The dataset consists of 78,200 images in the training set and 13,800 images in the testing set. Researchers scanned hundreds of handwritten documents of different writers and cropped each character manually. Each grayscale image is 32x32 pixels in dimension with white characters displayed on dark background.

Characters of Devanagari script
Figure 1: Characters of the Devanagari script

Model architecture


ResNet has opened up the possibilities to train deep neural networks without compromising with the accuracy. We have exploited the power of a deep neural network by implementing a ResNet architecture with eighty-five convolution layers.

  • The first layer in the model architecture is a batch normalization layer. A 3 x 3 convolution layer with 64 filters is placed following it. The batch normalization layer adds an extra level of normalization.

  • Next, three sets of multiple residual modules are stacked. The bottleneck and the pre-activation variant of the residual module is used.

    • In the first set, nine residual modules are placed together with 64 filters in the third layer of each residual module in the set. No spatial dimension is reduced when input enters this set. An extra convolution layer is added to the shortcut branch of the first residual module of the set.

    • In the second set, another nine residual modules are stacked. Each residual module within this set learns 128 filters in their third layer. The dimensions are reduced when the data enters this set using a stride of (2,2) in the first residual module. An extra convolution layer is added to the shortcut branch of this first residual module.

    • The last set also stacks nine residual modules together. Each residual module within this set learns 256 filters in their third layer. Similar to the second set, dimensions are reduced when the data enters this set using a stride of (2,2) in the first residual module and an extra convolution layer is added to the shortcut branch of this first residual module.

  • Finally, dimensions are reduced using an average pooling layer. The output is flattened and a softmax function is implemented at last to make the final predictions.

Architecture
Figure 2: Model architecture

Training and Result


The model trains on the training set for 80 epochs. The stochastic gradient descent algorithm is used for optimization. The initial learning is set as 1 and the momentum as 0.9. Over the course of training, a linear learning rate decay is applied. Data augmentation is performed on all the images in the batch that undergoes training. Categorical crossentropy is set as the loss function, and the performance of the model is evaluated using the accuracy metric.

Performance
Figure 3: Performance of the model

99.72 percent accuracy is obtained on the testing set. This is the highest ever accuracy recorded on the Devanagari Handwritten Character Dataset*. Figure 3 illustrates the performance of the model at various times of the training. A gradual decline in the loss is seen between epoch 6-55. Even after the point when the testing loss and the training loss start to diverge, the difference between the two remains constantly small. This result indicates that our overfitting is controlled even after training a deep neural network. We witness a plateau in learning close to epoch 80 and terminate the training. Figure 4 shows the successful recognition of a few sample images by the trained model.

Result
Figure 4: Sample predictions

*To the best of our knowledge


The dataset used can be found here.
Code implementation of the project can be found here.
The corresponding paper can be found here.

Webpage last updated: June, 2021