[1]张振宇,姜贺云,樊明宇.一种面向银行票据文字自动化识别的高效人工智能方法[J].温州大学学报(自然科学版),2020,(03):047-56.
 ZHANG Zhenyu,JIANG Heyun,FAN Mingyu.An Efficient Artificial Intelligence Method for Automatic Recognition of Bank Bill Text[J].Journal of Wenzhou University,2020,(03):047-56.
点击复制

一种面向银行票据文字自动化识别的高效人工智能方法
分享到:

《温州大学学报》(自然科学版)[ISSN:1674-3563/CN:33-1344/N]

卷:
期数:
2020年03期
页码:
047-56
栏目:
计算机科学
出版日期:
2020-08-25

文章信息/Info

Title:
An Efficient Artificial Intelligence Method for Automatic Recognition of Bank Bill Text
作者:
张振宇1姜贺云1樊明宇2
1.温州大学数理学院,浙江温州 325035;2.温州大学计算机与人工智能学院,浙江温州 325035
Author(s):
ZHANG Zhenyu1 JIANG Heyun1 FAN Mingyu2
1. College of Mathematics and Physics, Wenzhou University, Wenzhou, China 325035; 2. College of Computer Science and Artificial Intelligence, Wenzhou University, Wenzhou, China 325035
关键词:
文字检测快速区域卷积神经网络文字识别卷积循环神经网络简单循环单元
Keywords:
Text Detection Faster R-CNN Text Recognition CRNN SRU
分类号:
TP391.41
文献标志码:
A
摘要:
基于深度学习和数字图像处理算法,设计并实现了一套准确的票据文字自动识别系统.首先,对采集到的票据图像进行倾斜校正.然后,提出基于Faster R-CNN的单字检测方法,针对中文文字排列特点优化了RPN,设置合理的字符建议区域.使用RoIAlign替换RoIPool避免两次量化操作,保留了更准确的文字空间位置信息.对定位出的单字,按一定的规则合并为字符串字段.最后,应用本文提出的CNN + BiSRU + CTC网络对定位到的字符串图像进行行识别.相较于流行的CNN + BiLSTM + CTC模型,本文模型在不损失精度的同时能提升推理速度.实验表明,本文模型在特定业务票据场景下的文字检测和识别方面都有着上佳的表现.
Abstract:
Based on deep learning and digital image processing algorithms, we design and implement an accurate automatic image-based bank bill text recognition system. The system includes the following steps: firstly, rectify the image of the bank bill for better text alignment; secondly, introduce a single character detection method based on Faster R-CNN. We optimize the RPN procedure by setting more reasonable proposal areas of characters according to the arrangement features of Chinese characters. We also replace the RoIPool operation with RoIAlign operation for feature extraction to avoid twice quantization computation, and therefore locate the texts in the image more accurately. For the individual words that are detected, we merge them into string fields by certain rules. Finally, based on the CNN + BiSRU + CTC framework introduced in this paper, the identified string fields are recognized line by line. Compared with the popular CNN + BiLSTM + CTC framework, this BiSRU model can improve the processing speed without losing accuracy. The experimental results on real world data sets show that the proposed method has good performance in text detection and recognition for specific bank bills.

参考文献/References:

[1] 白翔,杨明锟,石葆光,等.基于深度学习的场景文字检测与识别[J].中国科学(信息科学),2018,48(5):531-544.
[2] 邓彩霞,侯杰,张晓卫.改进的自适应阈值方法用于文字图像边缘检测[J].数据采集与处理,2006(S1):63-66.
[3] Zhong Y, Karu K, Jain A K. Locating text in complex color images [J]. Pattern Recogn, 1995, 28(10): 1523-1535.
[4] Matas J, Chum O, Urban M, et al. Robust wide baseline stereo from maximally stable extremal regions [J]. Image Vision Comput, 2004, 22(10): 761-767.
[5] 丁晓青.汉字识别研究的回顾[J].电子学报,2002(9):1364-1368.
[6] Krizhevsky A, Sutskever I, Hinton G E. Imagenet classification with deep convolutional neural networks [EB/OL]. [2019-08-06]. https://www.cs.virginia.edu/~vicente/recognition/slides/lecture07/nips2012.pdf.
[7] Simonyan K, Zisserman A. Very deep convolutional networks for large-scale image recognition [EB/OL]. [2019- 08-06]. https://arxiv.org/abs/1409.1556.
[8] Szegedy C, Liu W, Jia Y, et al. Going deeper with convolutions [EB/OL]. [2019-08-06]. https://www.cv-foundation. org/openaccess/content_cvpr_2015/papers/Szegedy_Going_Deeper_With_2015_CVPR_paper.pdf.
[9] He K, Zhang X, Ren S, et al. Deep residual learning for image recognition [EB/OL]. [2019-08-06]. https://arxiv. org/abs/1512.03385.
[10] Huang G, Liu Z, Der Maaten L V, et al. Densely connected convolutional networks [EB/OL]. [2019-08-06]. https:// arxiv.org/pdf/1608.06993.pdf.
[11] Hu J, Shen L, Sun G. Squeeze-and-excitation networks [EB/OL]. [2019-08-06]. http://arxiv.org/pdf/1709.01507.pdf.
[12] Girshick R, Donahue J, Darrell T, et al. Rich feature hierarchies for accurate object detection and semantic segmentation [EB/OL]. [2019-08-06]. https://arxiv.org/pdf/1311.2524v3.pdf.
[13] Girshick R. Fast R-CNN [EB/OL]. [2019-08-06]. https://arxiv.org/pdf/1504.08083.pdf.
[14] Ren S, He K, Girshick R, et al. Faster R-CNN: towards real-time object detection with region proposal networks [EB/OL]. [2019-08-06]. https://arxiv.org/pdf/1506.01497.pdf.
[15] He K, Gkioxari G, Dollár P, et al. Mask R-CNN [EB/OL]. [2019-08-06]. https://arxiv.org/pdf/1703.06870.pdf.
[16] Shi B, Bai X, Yao C. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition [J]. IEEE T Pattern Anal, 2017, 39(11): 2298-2304.
[17] Hochreiter S, Schmidhuber J. Longshort-term memory [J]. Neural Comput, 1997(98): 1735-1780.
[18] Schuster M, Paliwal K K. Bidirectional recurrent neural networks [J]. IEEE T Signal Proces, 1997, 45(11): 2673-2681.
[19] Graves A, Fernández S, Gomez F, et al. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks [EB/OL]. [2019-08-06]. ftp://ftp.idsia.ch/pub/juergen/icml2006.pdf.
[20] Lei T, Zhang Y. Wang S I, et al. Simple recurrent units for highly parallelizable recurrence [EB/OL]. [2019-08-06]. https://arxiv.org/pdf/1709.02755.pdf.
[21] Everingham M L. Gool V, Williams C K, et al. The pascal visual object classes (voc) challenge [EB/OL]. [2019-08-06]. https://www.researchgate.net/publication/267296489_The_PASCAL_Visual_Object_Classes_2010_ VOC2010_Development_Kit_Contents.
[22] Lin T Y, Maire M, Belongie S, et al. Microsoft COCO: common objects in context [EB/OL]. [2019-08-06]. https:// arxiv.org/pdf/1405.0312.pdf.
[23] Chung J, Gulcehre C, Cho K, et al. Empirical evaluation of gated recurrent neural networks on sequence modeling [EB/OL]. [2019-08-06]. https://arxiv.org/pdf/1412.3555.pdf.
[24] Kingma D, Ba J. Adam: a Method for stochastic optimization [EB/OL]. [2019-08-06]. https://arxiv.org/pdf/1412. 6980v8.pdf

备注/Memo

备注/Memo:
收稿日期:2019-11-26
作者简介:张振宇(1994- ),男,河南洛阳人,硕士研究生,研究方向:智能系统与控制
更新日期/Last Update: 2020-08-25