A fast and accurate contour-based method for writer-dependent offline handwritten Farsi/Arabic subwords recognition
详细信息    查看全文
  • 作者:Kazim Fouladi (1) (2)
    Babak N. Araabi (1) (2)
    Ehsanollah Kabir (3)
  • 关键词:Farsi Persian Arabic subword ; Contour alignment ; Handwriting recognition ; Writer ; dependent ; Lexicon reduction ; Characteristic loci ; IBN SINA database
  • 刊名:International Journal on Document Analysis and Recognition
  • 出版年:2014
  • 出版时间:June 2014
  • 年:2014
  • 卷:17
  • 期:2
  • 页码:181-203
  • 全文大小:
  • 参考文献:1. AbdulKader, A.: A two-tier Arabic offline handwriting recognition based on conditional joining rules. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNCS 4768, Springer (2008)
    2. Abdulla, S., Al-Nassiri, A., Salam, R.A.: Off-line Arabic handwritten word segmentation using rotational invariant segments features. Int. Arab J. Inf. Technol. 5(2), 200-08 (Apr 2008)
    3. Abed, H., Margner, V.: Arabic text recognition systems—state of the art and future trends. In: Proceedings of International Conference on Innovations in Information Technology, pp. 692-96, Al Ain (2008)
    4. Aburas, A.A., Rehiel, S.M.A.: Off-line omni-style handwriting Arabic character recognition system based on wavelet compression. J. Arab Res. Inst. Sci. Eng. (ARISER) 3(4), 123-35 (2007)
    5. Al Hamad, H.A., Abu Zitar, R.: Development of an efficient neural-based segmentation technique for Arabic handwriting recognition. Pattern Recognit. 43(8), 2773-798 (2010) CrossRef
    6. Al-Hajj Mohamad, R., Likforman-Sulem, L., Mokbel, C.: Combining slanted-frame classifiers for improved HMM-based Arabic handwriting recognition. IEEE Trans. Pattern Anal. Mach. Intell. 31(7), 1165-177 (2009) CrossRef
    7. Al Khateeb, J.H., Jianmin, J., Jinchang, R., Stan, S.I.: Component-based segmentation of words from handwritten Arabic text. Int. J. Comput. Syst. Sci. Eng. 5(1), 344-48 (2009)
    8. Alma’adeed, S., Higgens, C., Elliman, D.: Off-line recognition of handwritten Arabic words using multiple hidden Markov models. Knowl. Based Syst. 17, 75-9 (2004) CrossRef
    9. Amrouch, M., Elyassa, M., Rachidi, A., Mammass, D.: Off-line arabic handwritten characters recognition based on a hidden markov models. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNCS 5099, pp. 447-54 (2008)
    10. Azmi, R.: Recognition of omnifont printed Farsi text. PhD Thesis, Tarbiat Modarres University, Tehran, Iran (1999) (in Farsi)
    11. Ball, G.R., Srihari, S.N.: Prototype integration in off-line handwriting recognition adaptation. In: Proceedings of International Conference on Frontiers in Handwriting Recognition, pp. 529-34, Montreal, Canada (2008)
    12. Ball, G.R., Srihari, S.N.: Writer adaptation in off-line Arabic handwriting recognition. In: Proceedings of SPIE, 6815 (2008)
    13. Basu, S., Das, N., Sarkar, R., Kundu, M., Nasipuri, M., Basu D.: Recognition of numeric postal codes from multi-script postal address blocks. In: Lecture Notes in Computer Science (including subseries Lecture Notes in Artificial Intelligence and Lecture Notes in Bioinformatics), LNCS 5909, pp. 381-86 (2009)
    14. Benouareth, A., Ennaji, A., Sellami, M.: Semi-continuous HMMs with explicit state duration for unconstrained arabic word modeling and recognition. Pattern Recognit. Lett. 29(12), 1742-752 (2008) CrossRef
    15. Bergroth, L., Hakonen, H., Raita, T.: A survey of longest common subsequence algorithms. In: Proceedings of the 7th Symposium on String Processing and, Information Retrieval (SPIRE), pp. 39-8 (2000)
    16. Cheikh, I.B., Kacem, A.: Neural network for the recognition of handwritten Tunisian city names. In: Proceedings of the 9th International Conference on Document Analysis and Recognition (ICDAR-7), vol. 2, pp. 1108-112, Curitiba (2007)
    17. Chen, J., Cao, H., Prasad, R., Bhardwaj, A., Natarajan, P.: Gabor features for offline arabic handwriting recognition. In: Proceedings of IAPR Workshop on Document Analysis Systems (DAS-0), pp. 53-8, Boston, MA (2010)
    18. Cheriet, M., Kharma, N., Liu, C.L., Suen, C.Y.: Character Recognition Systems: A Guide for Students and Practioners. Wiley, London (2007) CrossRef
    19. Chherawala, Y., Cheriet, M.: W-TSV: weighted topological signature vector for lexicon reduction in handwritten Arabic documents. Pattern Recognit. 45, 3277-287 (2012) CrossRef
    20. Dehghan, M., Faez, K., Ahmadi, M., Shridhar, M.: Handwritten Farsi (Arabic) word recognition: a holistic approach using discrete HMM. Pattern Recognit. 34(5), 1057-065 (2001) CrossRef
    21. Dreuw, P., Rybach, D., Gollan, C., Ney, H.: Writer adaptive training and writing variant model refinement for offline Arabic handwriting recognition. In: Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR-9), pp. 21-5, Barcelona (2009)
    22. Ebrahimi, A., Kabir, E.: A pictorial dictionary for printed Farsi subwords. Pattern Recognit. Lett. 29, 656-63 (2008) CrossRef
    23. Ehsani, M., Babaee, M.: Recognition of Farsi handwritten cheque values using neural networks. In: Proceedings of the 3rd International IEEE Conference Intelligent Systems, pp. 656-60 (2006)
    24. Eldin, A.S., Nouh, A.S.: Arabic character recognition: a survey. In: Proceedings of SPIE Optical Pattern Recognition, vol. 3386, pp. 331-40, Orlando, Florida, USA (1998)
    25. Farah, N., Souici, L., Farah, L., Sellami, M.: Arabic words recognition with classifiers combination: an application to literal amounts. In: Proceedings of Artificial Intelligence: Methodology, Systems, and Applications, pp. 331-40, Varna, Bulgaria (2004)
    26. Farah, N., Souici, L., Sellami, M.: Classifiers combination and syntax analysis for arabic literal amount recognition. Eng. Appl. Artif. Intell. 19(1), 29-9 (2006) CrossRef
    27. Farrahi Moghaddam, R., Cheriet, M., Adankon, M., Filonenko, K., Wisnovsky, R.: IBN SINA: a database for research on processing and understanding of Arabic manuscripts images. In: Proceedings of the 9th IAPR International Workshop on Document Analysis Systems (DAS -0), pp. 11-8. ACM (2010)
    28. Farrahi Moghaddam, R., Cheriet, M., Milo, T., Wisnovsky, R.: A prototype system for handwritten sub-word recognition: toward Arabic-manuscript transliteration CoRR, abs/1111.3281 (2011)
    29. Farrahi Moghaddam, R., Cheriet, M.: A multi-scale framework for adaptive binarization of degraded document images. Pattern Recognit. 43, 2186-198 (2010) CrossRef
    30. Fischer, A., Riesen, K., Bunke, H.: Graph similarity features for HMM-based handwriting recognition in historical documents. In: Proceedings of the International Conference on Frontiers in Handwriting Recognition (ICFHR -0), pp. 253-58 (2010)
    31. Glucksman, H.: Classification of mixed-font alphabets by characteristic loci. In: Proceedings of IEEE Computer Conference, pp. 138-41 (1967)
    32. James, G.M.: Curve alignment by moments. Ann. Appl. Stat. 1(2), 480-01 (2007) CrossRef
    33. Jou, F.D., Fan, K.C., Chang, Y.L.: Efficient matching of large-size histograms. Pattern Recognit. Lett. 25, 277-86 (2004) CrossRef
    34. Kessentini, Y., Paquet, T., Ben Hamadou, A.: Off-line handwritten word recognition using multi-stream hidden markov models. Pattern Recognit. Lett. 31(1), 60-0 (2010) CrossRef
    35. Khorsheed, M.S.: Off-line Arabic character recognition—a review. Pattern Anal. Appl. 5, 31-5 (2002) CrossRef
    36. Koerich, A.L., Sabourin, R., Suen, C.Y.: Large vocabulary off-line handwriting recognition: a survey. Pattern Anal. Appl. 6, 97-21 (2003) CrossRef
    37. Li, Z., Luo, X., Gao, C.: Multi-resolution curve alignment based on salient features. In: Proceedings of the 18th International Conference on, Pattern Recognition (ICPR-6), vol. 2, pp. 357-60 (2006)
    38. Liu, C.L., Suen, C.Y.: A new benchmark on the recognition of handwritten Bangla and Farsi numeral characters. Pattern Recognit. 42(12), 3287-295 (2009) CrossRef
    39. Lopresti, D., Nagy, G., Seth, S., Zhang, X.: Multi-character field recognition for Arabic and chinese handwriting. In: Lecture Notes in Computer Science, vol. 4768, p. 218 (2008)
    40. Lorigo, L.M., Govindaraju, V.: Offline Arabic handwriting recognition: a survey. IEEE Trans. Pattern Anal. Mach. Intell. 28(5), 712-24 (2006) CrossRef
    41. Madhvanath, S., Govindaraju, V.: The role of holistic paradigms in handwritten word recognition. IEEE Trans. Pattern Anal. Mach. Intell. 23, 149-64 (2001)
    42. Mahmoud, S.: Arabic (Indian) handwritten digits recognition using Gabor-based features. In: Proceedings of International Conference on Innovations in Information Technology, pp. 683-87, Al Ain (2008)
    43. Marques, J.S.: A fuzzy algorithm for curve and surface alignment. Pattern Recognit. Lett. 19(9), 797-03 (1998) CrossRef
    44. Mattar, M.A., Ross, M.G., Learned-Miller, E.G.: Nonparametric curve alignment. In: Proceedings of IEEE International Conference on Acoustics, Speech, and, Signal Processing (ICASSP-9), pp. 3457-460 (2009)
    45. Mozaffari, S., Faez, K., Margner, V.: Application of fractal theory for on-line and off-line Farsi digit recognition. In: Lecture Notes in Computer Science, vol. 4571, p. 868 (2007)
    46. Mozaffari, S., Faez, K., Margner, V., El-Abed, H.: Two-stage Lexicon reduction for offline Arabic handwritten word recognition. Int. J. Pattern Recognit. Artif. Intell. 22, 1323-341 (2008) CrossRef
    47. Munich, M.E., Perona, P.: Continuous dynamic time warping for translation-invariant curve alignment with applications to signature verification. In: Proceedings of the 7th IEEE International Conference on Computer Vision (ICCV-9), vol. 1, pp. 108-15 (1999)
    48. Myers, C., Rabiner, L., Rosenberg, A.: Performance tradeoffs in dynamic time warping algorithms for isolated word recognition. IEEE Trans. Signal Process. Acoust. Speech Signal Process. 28, 623-35 (1980) CrossRef
    49. Parvez, M.T., Mahmoud, S.A.: Arabic handwriting recognition using structural and syntactic pattern attributes. Pattern Recognit. 46, 141-54 (2013) CrossRef
    50. Plamondon, R., Srihari, S.N.: On-line and off-line handwriting recognition: a comprehensive survey. IEEE Trans. Pattern Anal. Mach. Intell. 22(1), 63-4 (2000) CrossRef
    51. Quiniou, S., Anquetil, E., Carbonnel, S.: Statistical language models for on-line handwritten sentence recognition. In: Proceedings of the Eight International Conference on Document Analysis and Recognition (ICDAR05) (2005)
    52. Ravani, R., Nooralishahi, P., Amani, A.S.: A novel approach for Persian/Arabic Intelligent Word Recognition (IWR). In: Proceedings of the 3rd European Workshop on Visual Information Processing (EUVIP), pp. 292-97 (2011)
    53. Ronn, B.B.: Non-parametric maximum likelihood estimation for shifted curves. J. R. Stat. Soc. B(63), 243-59
    54. Saeed, K., Albakoor, M.: Region growing based segmentation algorithm for typewritten and handwritten text recognition. Appl. Soft Comput. 9(2), 608-17 (2009) CrossRef
    55. Sari, T., Souici, L., Sellami, M.: Off-line handwritten Arabic character segmentation algorithm: ACSA. In: Proceedings of International Workshop on Frontiers in Handwriting Recognition, pp. 452-57, Niagara-on-the-Lake Ontario, Canada (2002)
    56. Sari, T., Sellami, M.: Cursive Arabic script segmentation and recognition system. Int. J. Comput. Appl. 27(3), 161-68 (2005)
    57. Sebastian, T., Klein, P., Kimia, B.: On aligning curves. IEEE Trans. Pattern Anal. Mach. Intell. 25, 116-25 (2003) CrossRef
    58. Sonka, M., Hlavac, V., Boyle, R.: Image Processing, Analysis and Machine Vision. Thomson Learning, USA (2008)
    59. Souici-Meslati, L., Sellami, M.: A hybrid approach for Arabic literal amounts recognition. Arab. J. Sci. Eng. 29, 177-94 (2004)
    60. Steinherz, T., Rivlin, E., Intrator, N.: Off-line cursive script word recognition: a survey. Int. J. Document Anal. Recognit. (IJDAR) 2, 90-10 (1999) CrossRef
    61. Vamvakas, G., Gatos, B., Stamatopoulos, N., Perantonis, S.: A complete optical character recognition methodology for historical documents. In: Proceedings of the Eighth IAPR International Workshop on Document Analysis Systems (DAS -8), pp. 525-32 (2008)
    62. Vinciarelli, A., Bengio, S.: Writer adaptation techniques in HMM based off-line cursive script recognition. Pattern Recognit. Lett. 23(8), 905-15 (2002)
    63. Wang, K.M., Gasser, T.: Alignment of curves by dynamic time warping. Ann. Stat. 25(3), 1251-276 (1997) CrossRef
    64. Wshah, S., Govindaraju, V., Cheng, Y., Li, H.: A novel lexicon reduction method for Arabic handwriting recognition. In: Proceedings of the 20th International Conference on Pattern Recognition (ICPR -0), pp. 2865-868 (2010)
    65. Wshah, S., Shi, Z., Govindaraju, V.: Segmentation of Arabic handwriting based on both contour and skeleton segmentation. In: Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR-9), pp. 793-97, Barcelona (2009)
    66. Wuthrich, M., Liwicki, M., Fischer, A., Indermuhle, E., Bunke, H., Viehhauser, G., Stolz, M.: Language model integration for the recognition of handwritten medieval documents. In: Proceedings of the 10th International Conference on Document Analysis and Recognition (ICDAR -9), pp. 211-15 (2009)
    67. Xia, M., Liu, B.: Aligning curves under projective transform and its application to image registration. In: Proceedings of IEEE International Conference on Image Processing (ICIP-6), pp. 349-52 (2006)
  • 作者单位:Kazim Fouladi (1) (2)
    Babak N. Araabi (1) (2)
    Ehsanollah Kabir (3)

    1. Learning Intelligent Systems Lab, School of Electrical and Computer Engineering, College of Engineering, University of Tehran, Tehran, Iran
    2. School of Cognitive Sciences, Institute for Research in Fundamental Sciences (IPM), Tehran, Iran
    3. Department of Electrical and Computer Engineering, Tarbiat Modarres University, Tehran, Iran
  • ISSN:1433-2825
文摘
This paper concerns with the recognition of offline Farsi/Arabic handwriting. The overall appearance of each subword in Farsi/Arabic script is described by its shape contour that provides us with a rich set of discriminative characteristics. Our approach is writer-dependent; that is, the system is trained to recognize the subwords written by a particular writer. A fast contour alignment is the central part of the proposed algorithm, where the alignment is performed based on a handful of feature points. An efficient lexicon reduction algorithm based on characteristic loci feature, which works directly on subwords-binary images, is proposed as well. Fast and precise alignment along with efficient lexicon reduction and appropriate similarity matching yields a high recognition rate while kept the speed high. Our experiment on IBN SINA database shows that the correct classification rate could be as high as 91.08?%. This figure is achieved merely by subword shape matching, without dots and diacritics, and without any statistical language model.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700