2016-12-22

Handwritten Chinese Character Recognition using DL, Overview

Some brief notes for those who wish to get some quick information on Handwritten Chinese Character Recognition (HCCR) using Deep Learning.

Origin

It’s plain to see that Chinese Character Recognition is a very challenging field in Pattern Recognition and Machine Learning, because of its huge amount of classes (compared to languages like English or French), and also because the writing style varies among one person to another, thanks to the complexity of Chinese characters.
I am doing some research on this topic, so I’ve gathered some information, and this article is primarily about these information.
Traditional methods to recognize Chinese characters mainly contain three parts:

1.preprocessing
2.feature extracting
3.classification

I don’t want to extend too much on the specific methods, because they are not doing good enough compared to methods equipped with Deep Learning.

Methods with Deep Learning

End-to-end CNN method:

Lenet5CNN on hand-written English letter and digits recognition, Yann Lecun et al.
This is, like, the very crucial network to make CNN a popular architecture, and it achieves 99.05% on MNIST, which is a hand-written digits database.
CNN with Data Augmentation, Simard et al.
By using elastic distortion and affine distortion, they achieved better result (99.6%) than the original Lenet architecture.
But these are all just digits, they can also be easily applied to English characters, which is not of a big amount. However when it comes to Chinese characters, these are not sufficient.
Chinese character recognition on 1000 classes, IDSIA.
In 2011, scholars in Switzerland firstly used GPUs to train CNN, and achieved recognition with a relatively large class number. They got 89.12% on NIST SD19, and 99.72% on MNIST. Based on this method they also got outstanding performance on ICDAR2011.
Multi-column Deep Neural Networks (MCDNN), IDSIA.
In 2012, scientists in IDSIA came up with a method to train multiple CNN on multiple GPUs, and perform simple integration on their decisions, achieving state-of-the-art performance on CASIA-OLHWDB1.1 online dataset. They also got 93.5% on offline dataset, which is better than ICDAR2011. In ICDAR2013 online and offline Chinese Character Recognition, both winning team (Fei Yin et al.) used CNN architecture.

There are some points to be cleared:
First, the difference between online and offline dataset is, an online dataset includes the stroke trajectory, which can be utilized, while the offline dataset doesn’t have this advantage. So naturally the methods towards these two are slightly different, and state-of-the-art accuracy on online datasets are slightly better than that on offline datasets, at least until recently.
Second, there are there hand-written datasets mentioned before: MNIST, NIST-SD19 and CASIA. MNIST is a database of handwritten digits which has a training set of 60,000 examples, and a test set of 10,000 examples. It is a subset of a larger set available from NIST; NIST-SD19 is a Special Dataset from NIST, which contains NIST’s entire corpus of training materials for handprinted document and character recognition. CASIA-OLHWDB and CASIA-HWDB are the online and offline Chinese handwriting databases, built by the National Laboratory of Pattern Recognition (NLPR), Institute of Automation of Chinese Academy of Sciences (CASIA).

CNN method with additional information:

The end-to-end method didn’t take full advantage of many field-specific information. These information ca be beneficial to recognition, and they cannot be learned by CNNs themselves.

CNN with Data Generating.
In many situations we have not enough training samples, thus Data Augmentation is a very important technology.
In 2015, Beyond human recognition:a CNN-based framework for handwritten character recognition, Chen L et al. proposed a deeper CNN architecture, at the same time incorporated Data Augmentation, used 5 CNN to vote, and they got a result of 96.79% on Offline Dataset, which is also the best result as far.
CNN with Shape Normalization.
Shape normalization can be viewed as a coordinate mapping in continuous 2D space between original and normalized characters. Two of the algorithms used for shape norm is Line Density Projection Interpolation (LDPI) for offline characters and pseudo 2D bi-moment normalization (P2DBMN) for online characters.
CNN with directional features.
For offline characters, because we don’t have the stroke trace information, so we should make good use of the gradient feature of the image. For edge detection, we can use Sobel Operation or Gabor feature.
For online characters, In his paper Spatially-sparse convolutional neural networks, Graham not only proposed a spatially-sparse CNN model, but he also introduced a method called Signature of Path, which is an online time-sequenced feature extraction method. Yang et al. got the best result on NLPR so far using this method with CNN. Some directional feature can also be applied to offline datasets, see HCCR-GoogLeNet model.

Other refined CNN methods:

There are also methods that are focused on doing changes on the network architecture and training process.

Fractional Max-pooling (FMP).
This method is proposed by Graham, and the main idea is to change the stride in Max-pooling to some random fraction between 1 and 2, rather than an integer like before. It’s proved that this trick can prevent overly-quick information loss.
DropSample.
Yang et al. proposed this method, its main idea is to give a weight to every sample in a mini-batch, and change the probability of these samples to be picked into training according to these weights, which is based on the CNN’s output confidence of each sample. Train multiple CNN like this, and integrate them. This method has the best performance on ICDAR2013 up till now.
Relaxation CNN(R-CNN)
The main feature of R-CNN is it does not do weight sharing in conv layers, in order to let the neurons learn different feature separately, but an obvious problem is this is going to use much more memory.
Alternately training CNN (ART-CNN)
This method proposed a strategy to dynamically adjust the learning rate. After every N round, it random pick a weight matrix, and set its learning rate to 0, thus greatly improve the training speed.

RNN / LSTM method:

It is interesting and actually reasonable to think that RNN’s ability to incorporate sequence information can be favorable to Chinese Character Recognition. In 2015, researchers first tried to use RNN on the top of CNN for character recognition. They use CNN to provide a feature sequence, and feed the sequence into an RNN. In fact, this is also kind of an end-to-end method.

What’s next?

I think some big companies have already been doing researches on this topic, and they have really big machines with, say, a thousand chips of GPU on each of them. They can thus train an enormous end-to-end CNN which probably can classify ten thousands classes or more. But methods with a little field-specific knowledge is also good to try. Moreover, I think combine RNN on the top of CNN is really a good point which may see its popularity goes up. As a summary, I think to achieve better and better result in HCCR, one or several points in the following should be paid attention to:

Adequate training dataset

Although we have many mature dataset as introduced before, it is not enough. As a workaround solution, we could rely on Data Augmentation more to generate training set.

Field-specific knowledge

We can also refer to some field-specific knowledge, which means we can make some use of the features like directional features in Chinese handwritings. Namely Gabor feature and Gradient feature in offline HCCR, eight-directional feature in online HCCR.

Great computation ability

Talking in a more data-driven way, the computation ability is always the greater, the better. We can just increase the size of the network, as long as the memory of the GPU can hold it, and as long as it doesn’t cost much time.

References

JIN Lian-Wen, ZHONG Zhuo-Yao, YANG Zhao, YANG Wei-Xin, XIE Ze-Cheng, SUN Jun. Applications of Deep Learning for Handwritten Chinese Character Recognition: A Review. Acta Automatica Sinica, 2016, 42(8):1125-1141.
Yin F, Wang Q F, Zhang X Y, Liu C L. ICDAR 2013 Chinese handwriting recognition competition. In:Proceedings of the 12th International Conference on Document Analysis and Recognition. Washington, DC, USA:IEEE, 2013.1464-1470
Graham B. Spatially-sparse convolutional neural networks. arXiv:1409.6070, 2014.
Cireçsan D C, Meier U, Schmidhuber J. Transfer learning for Latin and Chinese characters with deep neural networks. In:Proceedings of the 2012 International Joint Conference on Neural Networks. Brisbane, QLD:IEEE, 2012.1-6
Chen G, Zhang H G, Guo J. Learning pattern generation for handwritten Chinese character using pattern transform method with cosine function. In:Proceedings of the 2006 International Conference on Machine Learning and Cybernetics. Dalian, China:IEEE, 2006.3329-3333
Yang W X, Jin L W, Liu M F. Character-level Chinese writer identification using path signature feature, dropstroke and deep CNN. arXiv:1505.04922, 2015.

Fengyang Zhang's blog

Software Engineer @ Facebook / MCS @ UVa / BS @ BUPT