电子阅读笔中图像预处理及帧间配准的研究与实现

英文题名：Research and Realization of Image Preprocessing and Registration in Electronic Reading-pen
作者：刘伟
论文级别：硕士
学科专业名称：信息与通信工程
中文关键词：电子阅读笔 ; 移动扫描 ; 文字识别 ; 图像预处理 ; 二值化 ; 倾斜校正 ; 行切分 ; 图像配准
英文关键词：Electronic Reading-pen ; shift-scanning ; image preprocessing ; binarization ; skew adjustment ; text-row extract ; image registration
学位年度：2007
导师：林嘉宇
学科代码：081001
学位授予单位：国防科学技术大学
论文提交日期：2007-11-01

摘要

图像的预处理及帧间配准是基于移动扫描光学字符识别(OCR)的电子阅读笔的核心技术。本文构建了完整的针对移动扫描图像的前端处理算法模块,对其中各环节的算法进行了研究和实现。
     论文的具体工作如下:
     1、解析了AVI视频文件和BMP位图文件的格式,实现了AVI视频文件中各单帧图像的提取、RGB图像向灰度图像的转换、配准后的拼接图像保存为BMP格式等模块,为进一步的研究工作提供了良好的实验手段。
     2、在文本图像的二值化过程中,实现了Otsu全局阈值法和Bernsen局部阈值法;结合二者优点,实现了全局和局部阈值相结合的分步二值化方法,取得了较好的实验效果。
     3、实现了基于Hough变换的文字行倾斜校正算法,分析了Hough变换算法的复杂性及其对倾斜角度检测失误的缺陷。采用了投影值倾斜角检测法,把整数Bresenham算法引入到投影值的计算过程中,降低了计算的复杂度;进一步,分析比较了投影值法中的最大投影值法与最大空白段法,选择了更准确、有效的倾斜角度检测准则。
     4、采用水平投影法实现了文本行的切分,采用双三次插值法实现不同大小的文本行图像的规格化。
     5、在文字图像的帧间配准及拼接环节,实现了SIFT算法、基于文字轮廓的配准算法以及投影配准算法;并提出了改进的投影配准法,提高了该方法应对图像拉伸变形的鲁棒性。
     论文实现了基于移动扫描OCR的电子阅读笔的前端处理算法模块,仿真实验表明,前端处理算法模块的效果良好,为电子阅读笔后端的字符识别等工作打下了坚实的基础。
Image Preprocessing and Frame-to-frame Registration is the core technique of electronic reading pen based on shift-scanning OCR. This paper builds up a complete front-end processing algorithm system for images acquired by Shift-scanning, then studies and implements the algorithms of every stage in this system.
     The actual work of this paper is as follows:
     1. Analyses the format of the AVI video-files and BMP files; realize the extraction of single-frames image from an AVI file, the conversion of RGB images to gray images and the saving of mosaic images after registration in BMP format; provide good experimental media to the further research.
     2. In image binarization, implements Otsu global-threshold algorithm and Bersen local-threshold algorithm, and then realizes the step-binarization method based on these two algorithms mentioned above, which has good experimental effects.
     3. Realize the letter skew detection algorithm based on Hough-transform, analyzes the complexity and defects on skew detection of Hough-transform; adopts the projection-value-checking algorithm in slop degrees checking, introduces the Bresenham algorithm to calculate the projection values to lower the complexity of calculation. Furthermore, through the comparison on performance of max-projection-value method and max-blank-length method of this algorithm, selects the max-projection-value method in final experiment, which was proved to be more effective.
     4. Carries out text-row extracting using horizontal-projection algorithm, and normalizes the image size using bicubic-interpolation method.
     5. In image Frame-to-frame Registration and mosaic, creates the SIFT algorithm, registration algorithm based on letter figure and on projection. The paper improves the projection registration and the experiments result show that the improved method is robust to image distortions.
     In a word, this paper creates a complete front-end processing algorithm system for electronic reading pen based on shift-scanning OCR, and the final experiment shows that this algorithm system works well, and build up the foundation for the character recognize in back-end of Electronic Reading Pens.

引文

[1]张炜.印刷体文字识别方法研究.西北工业大学硕士学位论文,1999.3.
    [2]郎锐.数字图像处理学Visual C++实现.北京:北京希望电子出版社,2003.1.
    [3]阮秋琦.数字图像处理学.北京:电子工业出版社,2001.1.
    [4]http://dev.csdn.net/article/26/26870.shtm.
    [5]John McGowan.AVI Overview:Programming and Other Technical Topics.available on the web site:http://www.CSDN技术中心,John McGowan's AVI Overview Programming and Other Technical Topics.htm.
    [6]OpenDML AVI File Format Extensions,Version 1.02,OpenDML AVI M-JPEG File Format Subcommittee,available on the web site:http://www.jmcgowan.com/odmlff2.pdf.
    [7]孙即祥.图像处理.北京:科学出版社,2004.9.
    [8]Mehmet Sezgin.Survey over image thresholding techniques and quantitvative performance evaluation.Journal of Electronic Imaging 13(1),2004.1:146-165.
    [9]Graham Leedham,Chen Yah,Kalyan Takru,et al.Comparision of some thresholding algorithms for text or background segmentation in difficult document images.Seventh International Conference on Document Analysis and Recognition,Proceedings,2003:859-864.
    [10]B.Gatos,I.Pratikakis,S.J.Perantonis.Adaptive degraded document image binarization.Pattern Recognition,2006(39):317-327.
    [11]Otsu N.A threshold selection method from gray-level histograms[J].IEEE Transactions on Systems,Man,and Cybernetics,1979,9(1):62-66.
    [12]Bernsen J.Dynamic thresholding of gray-level images[A].In:Proceedings of the 8th International Conference on Pattern Recognition[C].Paris:IEEE Computer Society Press,1986:125-1255.
    [13]Sahoo P K,Sohani S,Wong A K C,Chen Y C.Survey of thresholding techniques[J].Computer Graphics,Vision and Image Processing,1988(41):233-260.
    [14]Yanowitz S D,Bruckstein A M.A new method forimage segmentation[J].Computer Graphics,Visionand Image Processing,1989(46):82-95.
    [15]Sauvola J,Pietikfiinen M.Adaptive document image binarization[J].Pattern Recognition,2000(33):225-236.
    [16]杨玲.字符图像混合二值方法研究.辽宁省交通高等专科学校学报.2006.12:35-37.
    [17]李雪峰,李灵锋,刘芳.基于遗传算法和Otsu理论的图像阈值自动选取.信息技术.2006.8:52-55.
    [18]张丘,马利庄,高岩,陈志华.基于方向投影的票据图像倾斜检测方法.计算机应用.2004.9:50-52.
    [19]Chih-Hong Kao,Hon-Son Don.Skew Detection of Document Images Using Line Structural Information.Information Technology and Applications,2005.ICITA 2005.Third International Conference on.04-07 July 2005:704-715.
    [20]P.Shivakumara,G.Hemantha Kumar.D.S Guru,P.Nagabhushan.A New Boundary Growing and Hough Transform Based Approach for Accurate Skew Detection in Binary Document Images.Proceedings of 2005 International Conference on Intelligent Sensing and Information Processing,2005:140-146.
    [21]A.Hashizume,P.S.Yeh,and A.Rosenfeld.A method of detecting the orientation of aligned components,Pattern Recognit.Lett.1986(4):125-132.
    [22]W.Postl,Detection of linear oblique structures and skew scan in digitized documents,in Proc.8th Int.Conf.Pattern Recognition,1986:739-743.
    [23]Markus Feldbach,Klaus D,Tonnies:Robust Line Detection in Historical Church Registers[C].DAGM-Symposium 2001:140-147.
    [24]Urs-Viktor Marti,Horst Bunke:Handwritten Sentence Recognition[C].ICPR 2000:3467-3470.
    [25]Zhen-Long Bai,Qiang Hao:An Approach to Extracting the Target Text Line from a Document Image Captured by a Pen Scanner[C].ICDAR,2003:76-80 vol.1.
    [26]Jisheng Liang,Robert M,Haralick,Ihsin T,Philips:A Statistically based,Highly Accurate Text-line Segmentation Method[C].ICDAR 1999:551-554
    [27]E.Ratztaff,Inter-Line Distance Estimation and Text Line Extraction for Unconstrained Online Handwriting[C],Proceeding of the 7~(th) International Workshop on the Frontiers of Handwriting Recognition,2000.9.
    [28]J.Liang,I.Philips and R.M.Haralick,"A statistically based highly accurate text-line segmentation method"[C],Proc.5~(th) ICDAR,1990:551-554.
    [29]A.Zahour,B.Taconet,P.Mercy and S.Ramdanc,"Arabic hand-written text-line extraction"[C].Proc.6~(th) ICDAR,2001:281-285,
    [30]S.Ariyoshi,A Character Segmentation Method for Japanese Printed Documents Coping with Touching Character Problems[C],Proc.ICPR,1992:313-316.
    [31]ZHAO,S,CHI,Z,SHI,P and YAN,H,"Two-stage Segmentation of Unconstrained Handwritten Chinese Characters"[J],Pattern Recognition,2003.1(36):145-156.
    [32]林晓梅,魏巍等.多模态图像配准技术.计算机测量与控制.2006.14:1227-1229.
    [33]Yao J C,Image registration based on both feature and intensity matching[A].Proceedings of 2001 IEEE International Conference on Acoustics,Speech and Signal Processing[c].Kauai,Hawaii,USA:IEEE.2001,3:1693-1696.
    [34]G.David Lowe,Object recognition from local scale invariant features,in Proceedings of the International Conference on Computer Vision,1999:1150-1157.
    [35]Y.Dufournaud,C.Schmid,and R.Horaud,Image matching with scale adjustment,February 2004.2(93,no.2):175-194.
    [36]K.Mikolajczyk and C.Schmid,Indexing based on scale invariant interest points,ICCV 2001:525-531.
    [37]Yutaro yamamura,Hyoungseop Kim,Akiyoshi Ymamoto,A Method for Image Registration by Maximization of Mutual Information,SICE-ICASE,2006.International Joint Conference,2006,10:1469-1472.
    [38]GuoTai Jing,Juan Juan Li,Zhigang Shang,Yu Cao,A method of Image registration based on its geometric character,Engineering in Medicine and Biology Society,2005.IEEE-EMBS 2005.27~(th) Annual International Conference of the.2005:1596-1598.
    [39]K.Mikolajczyk and C.Schmid,A performance evaluation of local descriptors,CVPR 2003:1615-1630.
    [40]D.G.Lowe.Distinctive Image Features from Scale-invariant Keypoints.International Journal of Computer Vision,2004.
    [41]Koenderink,J.J.The structure of images.Biological Cybernetics,1984(50):363-396.
    [42]Tony Lindeberg,Scale-Space for Discrete Signals.IEEE Transactions on Pattern Analysis and Machine Intelligence.1990.3(12,NO.3):234-254.
    [43]Tony Lindeberg,Jan-Olof Eklundh.Scale Detection and Region Extraction from a Scale-Space Primal Sketch.Third International Conference on Computer Vision,1990:416-426.
    [44]Robert sim,Gregory Dundek.Learning Generative Models of Invariant Features.Intelligent Robots and Systems,2004.(IROS 2004).Proceedings.2004IEEE/RSJ International Conference on.28 Sept.-2 Oct.2004(4):3481-3488.
    [45]李晓明,郑链,胡占义.一种基于SIFT特征的遥感图像自动配准方法.遥感学报.2006,10(6):885-892.
    [46]庄军,李弼程,陈刚.一种有效地文本图像二值化方法.微计算机信息,2005,8(21):56-58.
    [47]张丘,马利庄,高岩,陈志华.基于方向投影的票据图像倾斜检测方法.计算机应用.2004,24(9):50-51.
    [48]www.wizcomtech.com.
    [49]www.hw99.com/product/productiview-51.htm.
    [50]www.center.com.cn/page/sp16.htm.
    [51]www.ulidar.com/u9088.asp.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700