智能型表格自动识别、还原与生成的实现研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
本文在分析神经网络理论和图象处理及其特征提取理论的基础上,设计并实现了一种基于和经网络与特征提取相结合的表格自动识别及还原技术。对于表格线的提取、识别、还原与生成及字符自动化处理录入技术进行了深入的讨论,并通过实践证明了这是一门既有理论研究意义又有实用价值的新技术。
     本文对人工神经网络理论进行了研究,探讨了网络形式及算法的选择、算法的实现、学习样本的收集、网络参数选择、BP算法缺陷、表格线提取、还原、生成及字符识别、还原生成等问题,并针对BP算法的缺陷提出了和实现了改进型BP算法,使网络学习效率提高,对不同人的不同字型字体有较强的鲁棒性,采用了基于链码特征和凹凸分布特征的方法来抽取字符特征。使用了建立奇异特征库的方法,较好地解决了部分特征不收敛的问题,提高神经网络的学习效率和分类能力。并在算法实现中运用了面向对象的方法,保证了大型系统软件的易维护性。西南计算机公司采用了以上算法,实用中获得了满意的效果,证明了该方法的有效性。
In this paper, a way about processing form based on artificial neural network, digital image processing, and features extraction theory having been designed. The way of auto form extraction and processing having been affirmed is a very valuable technique.
    The disquisition includes choice of algorithm, accomplish of algorithm, collection of learning sample, parameter of net, shortcoming of BP algorithm, extraction and reduction form line etc. Referring to shortcoming of traditional BP algorithm, a modified learning factor with adaptation is introduced. Because of every different font has robust, the way based chain coded and knaggy feature is used. A bizarre sample feature database is constructed for speeding up modified BP learning and classification. This algorithm is used by South Computer Company, the good result is gotten. The ideal is approved reasonable.
引文
1.张析中著,《汉字识别技术》,清华大学出版社,广西科学技术出版社,1992。
    2.周冠雄编著,《计算机模式识别结构方法》,华中工学院出版社,1986。
    3.沈清,《模式识别导论》,国防科技大学出版社,1993。
    4.冈察雷斯 M.汤姆逊著,《句法模式识别》,清华大学出版社,1984。
    5.吴世昌,一种字符的模糊识别方法,计算机学报,1986年第5期。
    6.吴智彪,夏莹,孙承鉴,手写印刷体汉字相关属性关系图启发式匹配法,计算机学报,1990年第3期。
    7.边肇棋等编著,《模式识别》,清华大学出版社,1988。
    8.蔡元龙编,《模式识别》,西北电讯工程学院出版社,1986
    9.张宏润编著,《智能系统设计开发技术》,成都科技大学出版社,1997。
    10.王耀南著,《智能控制系统--模糊逻辑、专家控制、神经网络》,湖南大学出版社,1996。
    11.赵震宇著,《模糊理论和神经网络的基础与应用》,清华大学出版社,1996。
    12.陈明等,复杂中文报纸的版面分析,理解和重构,清华大学学报。2001,41:P29-32。
    13.王贵新等,手写字符轮廓曲率的特征提取和识别。华中理工大学学报。2001,VOL,29,NO.205.P.83-86.
    14.居琰等,特征融合于手写体汉字识别研究,电子科技大学学报。2002 31:229-233。
    15.居琰等,特征融合于手写体汉字识别研究,电子科技大学学报。2002 31:229-233。
    16.王贵新等,OCR和OMR同时存在的表格数据识别编程。电脑开发与应用。2000年第13卷10期。
    17. S.W. Lee, C.H. Lim, H. Ma and Y.Y. Tang, Multiresolution Recognition if Unconstrained handwritten Numerals with Wavelet Transform and Multilayer Cluster Neural Network, Pattern recognition, 1996,29(12).
    18. Y.S. Huang and C.Y. Suen, A Method of Combining Multiple Expert for the Unconstrained handwritten Numerals, IEEE Tran. Pattern Ana. Mach. Int., 1995, 17(1).
    
    
    19. Y. Le Cun et al., Constrained Neural Network for Unconstrained Handwritten digit. Recognition, Pro. 1 Workshape on Frontiers in Handwriting Recognition, Montreal, Canada, 1990.
    20. Bberhars Mandler, Advanced Preprocessing Technique for on-line Recognition of Handprinted Symbols, Computer Recognition and Human Production of Handwriting Eds. R. Plamondon, C. Y. suen & M. L. Simner @ Word Scientific Publ. Co. , 1989.
    21. D. Trier, A. K. Jain and T.Taxt, Feature Extraction Methods for Character Recognition - A survey, Pattern Recognition, vol. 29, no.4, 1996.
    22. Luan L. Lee, Reliable on-line Human Sygnature Verification Systems, IEEE Transaction on Pattern Analy. And. Mach. Int., 1996, 18 (6).
    23. J. Novovicova, P. Pudi 1 and J. Kittler, Divergence Based Feature Selection for Multimodel Class Densittes, IEEE Transaction on Pattern Analy. And. Mech. Int., 1996, 18(2).
    24. Adrian P. W. And Hong Yan, Linking Broken Character Borders with Variable Sized Marks tolmprove Recognition, Pattern Recognition, 1996,29 (8).
    25. J. Cao, M. Ahmadi and M. Shridhar, Recognition of Handwrittern Numerals with Multiple Feature and Multis- rage Classifier, Pattern Recognition, 1995,28(2).
    26. J. M. Westall and M. S. Narasimha, Vertax Directed Segmentation Handwritten Numerals, Pattern Recognition, 1993,26(10).
    27. D.S. Yeung and H.S. Fong, A Fuzzy Substroke Extractor for Handwritten Chinese Characters, Pattern Recognition, 1996, 29 (12).
    28. L. Duneau and B. Dorizzi,On-line Cursive Recognition:A User Adaptive System for Word Identification, Pattern Recognition, 1996, 29(12).
    29. Y.Y. Tang, S.W. Lee and C.Y. Suen, Automatoc Document Processing A Survey, Pattern Recognition, 1996,29(12).
    30. H. Nishida, A Structural Approach to Reprentation of Curved Objects, Pattern Recognition, 1997,30(1).
    31. X. Li and D.Y. Yeung, On-line Handwritten Alphanumeric
    
    Character Recognition Using Dominant Points in Strokes, Pattern Recognition, 1997,30(1).
    32. H. Nishada, Curve Description Based on Directional Features and Quasi-Convexity/Concavity, Pattern Recognition, 1995, 28(7).
    33. Yi Lu, Machine Printed Character Segmenation, Pattern Recognition, 1995, 28(1).
    34. P. Wunsch and A.F. Laine, Wavelet Descritors for Multiresolution Recognition of Handprinted Characters, Pattern Recognition, 1995,28(8).
    35. Ehud Riviin and lsaac Weiss, Local invariants for Recognition, IEEE Tran. Pattern Ana. Mach. Int.,1995, 17(3).
    36. P.D. Gader etc., Automatic: Feature Generation for Handwritten Digit Recognition, IEEE Tran. Pattern Ana. Mach. Int., 1996,18(2).
    37. L.L. Lee, T. Berger and E. Aviczer, Reliable On-line Human Signature Verification Systems, IEEE Tran. Pattern Ana. Mach. Int., 1996, 18(6).
    38. J. Yuan and C.Y. Suen, An Optimal On Algorithm for Identifying Line Segments from Sequence of Chain Codes, Pattern Recognition, 1995,28(5).
    39. J. Koplowitz and S. Plante, Corner Detection for Chain Coded Curves, Pattern Recognition, 1995,28(6).
    40. G. Wilfong, F. Sinden and L. Ruedisueli, On-line Recognition of Handwritten Symbols, IEEE Tran. Pattern Ana. Mach. Int., 1996, 18(9).
    41. PA. Chou, Recognition of Equations Using a Two-Dimensional Stochastic Context-free Grammer, SPIE Uisual Conmun. Image. Process. 1989.
    42. J.P. Haton and R. Mohr, A New Parsing Algorithm for Imperfect Patterns and lts Application, Presented at the 3th IJCPR, San Diego, CA, Nov., 1976.
    43. M.R. Azimi-Sadjadi and S. Citrin, Fast Learning Process of Multilayer Neural Nets Using Recursive Least Squares Technique, in Proc. IEEE Int. Conf. Neural Networks, May 1989.
    44. M.R. Azimi-Sadjadi, S. Citrin and S. Sheedvash, Supperised Learning Process of Multilayer Perceptron Neural Networks
    
    Using Fast Recursive Least .Squats, in Proc. IEEE Int. Conf. Acon'st., Speech, Signal Processing, (ICASSP'90) (New Mexico), Apr., 1990.
    45. S. Impedovo, ed., Fundamentals in Handwriting Recognition, Springer-Berlin, 1994.
    46. T. Parlidis, Structure Pattern Recognition, Springer-Berlin, New York, 1980.
    47. E. Lecolinet and JV. Moreau, A new system for automatic segmentation and recognition of unconstrained handwritten ZIP codes. Proc. Sixth Scandinavian Conf. image Analysis 1,585-592(1989).
    48. Schurmann, J. Reading machines, Proc. 6th inter. J. Conf. On Pattern Recognition, Munich. 1031-1044, 1982.
    49. S. Mori, C.Y. Suen and K. Yamamoto, Historical review of OCR and development, Proc. IEEE 80, 1029-1058(July 1992).
    50. Pan Bao Chang, Wu Shi Chang, A Method of Processing Digital Character Noise, Journal of Electronics, Vol. 2, No.2 (1985).
    51. Pan Bao-Chang, Wu Sichang, Yan Guanyi, A Method of Recogning Handprintded Characters, World Scientific publishing CO. PTE. LTD(1989).
    52. Pan Baochang, Floating Mask Method for Extracting Handprinted Character Feature, IEEE-proc, 8th ICPR. Paris (1986).
    53. L.O'Gocument The Document Spectrum for Page Layout Analysis, IEEE Tranaactions on Pattern Analysis and Machine intelligence, Vol., 15, Num. 11,Nov 1993, P1162-1173.
    54. Lin Yu Tseng ,Rung Ching Chen ,Recongition and data extraction for form documents based on three types of line segments. 1998.
    55. Jinu-Lin Chen ,An efficient algorithm for form structure extraction using strip projection.1997.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700