基于多粒度级联多层梯度提升树的选票手写字符识别算法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Handwritten character recognition algorithm based on multi-grained cascade multi-layered gradient boosting decision trees
  • 作者:徐英杰 ; 李国勇 ; 洪文焕
  • 英文作者:XU Yingjie;LI Guoyong;HONG Wenhuan;Chengdu Institute of Computer Application, Chinese Academy of Sciences;University of Chinese Academy Sciences;
  • 关键词:智能选举系统 ; 手写字符 ; 梯度提升树 ; 多粒度级联多层梯度提升树 ; 阈值判别法
  • 英文关键词:intelligent election system;;handwritten character;;gradient boosting decision tree;;multi-grained cascade multi-layered gradient boosting decision tree;;threshold judgment method
  • 中文刊名:JSJY
  • 英文刊名:Journal of Computer Applications
  • 机构:中国科学院成都计算机应用研究所;中国科学院大学;
  • 出版日期:2019-07-20
  • 出版单位:计算机应用
  • 年:2019
  • 期:v.39
  • 基金:四川省科技支撑计划项目(2015GZ0088)
  • 语种:中文;
  • 页:JSJY2019S1006
  • 页数:5
  • CN:S1
  • ISSN:51-1307/TP
  • 分类号:31-35
摘要
针对传统算法如模板匹配法、支持向量机(SVM)在智能选举计票系统手写字符识别上准确率低的问题,提出一种基于多粒度级联多层梯度提升树方法进行准确又快速的选票手写字符识别的算法。首先,利用多粒度扫描的方式,通过设置多种大小不同的采样滑动窗口对图片进行逐步采样,得到特征子样本,再经过随机森林转换并拼接得到比原始数据更加抽象和健壮的再表征向量;再利用级联的多层梯度提升树,对得到高阶特征的表征向量进行逐层训练得到模型,根据多层梯度提升树对字符进行识别分类;最后,对于具有二义性符号,通过提出的阈值判别法进行判断,对有二义性的符号,则进行人工审查,反之直接输出,保证识别结果的高准确率。实验结果表明,该算法相比模板匹配方法、SVM算法在准确率上均有很大提高;与gcForest相比,该算法在测试准确率上平均提升了5.29%;与CNN相比,测试准确率上平均提升了3.3%,在训练时间上缩短了89.24%,测试识别耗时减少了48.61%。
        To solve the problem that traditional algorithms such as template matching method and SVM(Support Vector Machine) have low accuracy of handwritten character recognition in intelligent election counting system, an accurate and fast handwritten character recognition algorithm based on multi-grained cascade multi-layered Gradient Boosting Decision Trees method(gcmGBDTs) was proposed. Firstly, by using multi-grained scanning method, the pictures were gradually sampled by setting a plurality of sampling sliding windows with different sizes to obtain feature sub-samples, and then subjected to Random Forest(RF) conversion and spliced to obtain a more abstract and robust re-characterization vector than the original data; then, using the cascade multi-layered gradient boosting decision trees, the characterization vectors with high-order features were trained layer by layer to obtain the model, and the characters were identified and classified according to the multi-layered gradient boosting decision trees. Finally, for the ambiguous symbol, a threshold judgment method was proposed for judgment; and for the ambiguous symbol, manual review was performed, and on the contrary, it was directly output, which ensured high accuracy of recognition results. The experimental results show that the proposed algorithm has a higher accuracy than the template matching method and SVM algorithm. Compared with gcForest, the algorithm improves the test accuracy by 5.29% on average. Compared with CNN(Convolutional Neural Network), the test accuracy is improved. The average increase was 3.3%, the training time was shortened by 89.24%, and the test identification time was reduced by 48.61%.
引文
[1] 张站.基于符号识别技术的选举计票系统研究[D].合肥:安徽大学,2011.
    [2] 张晶晶.基于版面理解的选票识别若干关键技术研究[D].杭州:浙江工业大学,2012.
    [3] 汪磊.基于结构特征提取的选票分析系统的设计与研究[D].合肥:安徽大学,2013.
    [4] 昝丽红.选举计票系统设计分析[D].合肥:安徽大学,2014.
    [5] 王海洋.手写符号高识别率识别及预处理算法研究[D].合肥:安徽大学,2015.
    [6] 徐傲.基于栈式自动编码机的选票手写字符识别算法设计[J].计算机应用,2017,37(S2):183-185,197.
    [7] ZHOU Z H,FENG J.Deep forest:Towards an alternative to deep neural networks[J].arXiv Preprint,2017,2017:atXiv:1702.08835.
    [8] CHEN C,LIAW A,BREIMAN L.Using random forest to learn imbalanced data[EB/OL].[2018-11-01].http://http://www.stat.berkeley.edu/tech?-reports/666.pdf.
    [9] FRIEDMAN J H.Greedy function approximation:a gradient boosting machine[J].The Annals of Statistics,2001,29(5):1189-1232.
    [10] BREIMAN L,FRIEDMAN J,STONE C J,et al.Classification and Regression Trees[M].Boca Raton,FL:Chapman & Hall/CRC,1984
    [11] CHEN T,GUESTRIN C.XGBoost:A scalable tree boosting system[C]// Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM,2016:785-794.
    [12] KRIZHEVSKY A,SUTSKEVER I,HINTON G E.ImageNet classification with deep convolutional neural networks[C]// Proceeding of the 25th International Conference on Neural Information Processing Systems.Nice,France:Curran Associates Inc,2012,1:1097-1105.
    [13] KIM P.Convolutional neural network[M]// Matlab Deep Learning.Berkeley:Apress,2017:121-147.
    [14] LECUN Y,BOSER B E,DENKER J S,et al.Handwritten digit recognition with a back-propagation network[C]// Proceeding of the 1990 International Conference on Neural Information Processing Systems.San Francisco:Morgan Kaufmann Publishers Inc.,1990:396-404.
    [15] 王翔,胡学钢.高维小样本分类问题中特征选择研究综述[J].计算机应用,2017,37(9):2433-2438.
    [16] BUDDHIRAJU K M,RIZVI I A.Comparison of CBF,ANN and SVM classifiers for object based classification of high resolution satellite images[C]// Proceeding of the 2010 IEEE International Geoscience and Remote Sensing Symposium.Piscataway:IEEE,2010:40-43.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700