Comprehensive synthetic Arabic database for on/off-line script recognition research
详细信息    查看全文
  • 作者:Raid M. Saabni (1)
    Jihad A. El-Sana (2)
  • 关键词:Arabic ; Script ; Recognition ; Database ; Synthetic ; PCA ; Kmeans
  • 刊名:International Journal on Document Analysis and Recognition
  • 出版年:2013
  • 出版时间:September 2013
  • 年:2013
  • 卷:16
  • 期:3
  • 页码:285-294
  • 全文大小:1204KB
  • 参考文献:1. ADAB: Arabic DAta Base, for on-line recognition of the cursive Arabic handwritten word
    2. Al Ohali Y., Cheriet M., Suen C.: Databases for recognition of handwritten arabic cheques. Pattern Recogn. 36(1), 111-21 (2003) CrossRef
    3. Al-Yousefi H., Udpa S.: Recognition of arabic characters. IEEE Trans. Pattern Anal. Mach. Intell. 14(8), 853-57 (1992) CrossRef
    4. Alma’adeed, S.: Recognition of off-line handwritten arabic words using neural network. In: GMAI -6: Proceedings of the conference on Geometric Modeling and Imaging, pp. 141-44. IEEE Computer Society, Washington, DC, USA (2006)
    5. Alsallakh, B., Safadi, H.: Arapen: an arabic online handwriting recognition system. In: Information and Communication Technologies, 2006 (ICTTA -6). 2nd, vol. 1, pp. 1844-849 (April 2006)
    6. Alshebeili S.A., Nabawi A.A.F., Mahmoud S.A.: Arabic character-recognition using 1-d slices of the character spectrum. Signal Process. 56(1), 59-5 (1997) CrossRef
    7. Amin A.: Off-line arabic character recognition: the state of the art. Pattern Recogn. 31(5), 517-30 (1998) CrossRef
    8. Amin A., Mari J.: Machine recognition and correction of printed arabic text. IEEE Trans. Syst. Man Cybern. 19(5), 1300-306 (1989) CrossRef
    9. Ataer, E., Duygulu, P.: Matching ottoman words: an image retrieval approach to historical document indexing. In: CIVR -7: Proceedings of the 6th ACM international conference on Image and video retrieval, pp. 341-47. ACM, New York, NY, USA (2007)
    10. Ball, G., Srihari, S., Srinivasan, H.: Segmentation-free and segmentation-dependent approaches to arabic word spotting. In: Proceedings of International Workshop on Frontiers in Handwriting Recognition (IWFHR-10), pp. 53-8. La Baule, France (October 2006)
    11. Biadsy, F., El-Sana, J., Habash, N.: Online Arabic handwriting recognition using hidden Markov models. In: Proceedings of the 10th International Workshop on Frontiers of Handwriting and Recognition, pp. 3278-286 (2006)
    12. Biadsy, F., Saabni, R., El-Sana, J.: Segmentation-free online arabic handwriting recognition. Int. J. Pattern Recogn. (page to appear) (2011)
    13. Cheng, W., Lopresti, D.: Parameter calibration for synthesizing realistic-looking variability in offline handwriting. In: Document Recognition and Retrieval XVIII IS&T/SPIE International Symposium on Electronic Imaging, p. 157. IEEE Computer Society, San Francisco, CA (2011)
    14. El-Emami S., Usher M.: On-line recognition of handwritten arabic characters. IEEE Trans. Pattern Anal. Mach. Intell. 12(7), 704-10 (1990) CrossRef
    15. El Abed, H., Kherallah, M., Margner, V., Alimi, A.M.: Arabic online handwriting recognition competition. In: 10th International Conference on Document Analysis and Recognition (ICDAR), pp. 1383-387 (2009)
    16. El-sheikh T., Guindi R.: Automatic recognition of isolated arabic characters. Signal Process. 14(2), 177-84 (1988) CrossRef
    17. Garris, M.: Design and collection of a handwriting sample image database
    18. Gatos, B., Konidaris, T., Ntzios, K., Pratikakis, I., Perantonis, S.J.: A segmentation-free approach for keyword search in historical typewritten documents. In: Proceedings of Eighth International Conference on Document Analysis and Recognition, 2005, pp. 54-8, vol. 1. 29 August- September (2005)
    19. Gillies, A., Erl, E., Trenkle, J., Schlosser, S.: Arabic text recognition system. In: Proceedings of the Symposium on Document Image Understanding Technology (1999)
    20. http://www.cedar.buffalo.edu/Databases/
    21. Kharma, N., Ahmed, M., Ward, R.: A new comprehensive database of hand-written arabic words, numbers and signatures used for ocr testing. In: IEEE Canadian Conference on Electrical and Computer Engineering, pp. 766-68 (1999)
    22. Koerich A.L., Sabourin R., Suen C.Y.: Large vocabulary off-line handwriting recognition: a survey. Pattern Anal. Appl. 6(2), 97-21 (2003) CrossRef
    23. Maddouri, S., Amiri, H.: Combination of local and global vision modelling for arabic handwritten words recognition. In: Proceedings of Eighth International Workshop on Frontiers in Handwriting Recognition, 2002, pp. 128-35 (2002)
    24. Mahmoud S.A.: Arabic character recognition using fourier descriptors and character contour encoding. Pattern Recogn. 27(6), 815-24 (1994) CrossRef
    25. Margner, V., Pechwitz, M.: Synthetic data for arabic ocr system development. In: Sixth International Conference on Document Analysis and Recognition (ICDAR-1), pp. 1159-163 (2001)
    26. Marti U., Bunke H.: The iam-database: an english sentence database for off-line handwriting recognition. Int. J. Document Anal. Recogn. 5, 39-6 (2002) CrossRef
    27. Mezghani N., Mitiche A., Cheriet M.: Bayes classification of online arabic characters by gibbs modeling of class conditional densities. IEEE Trans. Pattern Anal. Mach. Intell. 30(7), 1121-131 (2008) CrossRef
    28. Mozzaffari, S., Faez, K., Faradji, F., Ziaratban, M., Golzan, M.: A comprehnsive isolated farsi/aarabic character database for handwritten ocr research. In: Proceedings of the 10th International Workshop on Frontiers in Handwriting Recognition, France, pp. 385-89 (October 2006)
    29. Pechwitz, M., Maddouri, S.S., Margner, V., Ellouze, N., Amiri, H.: Ifn/enit—database of handwritten arabic words. In: Proceedings of CIFED 2002, pp. 129-36 (2002)
    30. Plamondon, R., Guerfali, W.: Why handwriting segmentation can be misleading? In: Proceedings of International Conference on Pattern Recognition, pp. 369-00. Vienna, Austria (1996)
    31. Plamondon R., Srihari S.N.: On-line and off-line handwriting recognition: a comprehensive survey. IEEE Trans. Pattern Anal. Mach. Intell. 22, 63-4 (2000) CrossRef
    32. Saabni R., El-Sana J.: Justifying holistic approach for arabic script recognition. Technical report, Ben Gurion University of the negev, Israel (2008)
    33. Saabni, R., El-sana, J.: Hierarchical on-line arabic handwriting recognition. In: 10th International Conference on Document Analysis and Recognition (ICDAR), pp. 867-71. Barcelona, Spain (2009)
    34. Solimanpour, F., Sadri, J., Suen, C.Y.: Standard databases for recognition of handwritten digits, numerical strings, legal amounts, letters and dates in farsi language. In: Proceedings of the 10th IntlWorkshop on Frontiers in Handwriting Recognition (IWFHR), pp. 3-, France (October 2006)
    35. Souici, S.T., Sellami, L.M.: Off-line handwritten arabic character segmentation algorithm: Acsa. In: Eighth International Workshop on Frontiers in Handwriting Recognition, pp. 452-57 (2002)
    36. The Unipen Website: http://hwr.nici.kun.nl/unipen/unipen-history.html
    37. Varga, T., Bunke, H.: Comparing natural and synthetic training data for on-line cursive handwriting recognition. In: 9th International Workshop on Frontiers in Handwriting Recognition (IWFHR-4), pp. 221-25 (2004)
    38. Varga, T., Bunke, H.: Generation of synthetic training data for an hmm-based handwriting recognition system. In: ICDAR -3: Proceedings of the Seventh International Conference on Document Analysis and Recognition, pp. 618-22, IEEE Computer Society, Washington, DC, USA (2003)
    39. Varga, T., Kilchhofer, D., Bunke, H.: Template-based synthetic handwriting generation for the training of recognition systems. In: Proceedings of the 12th Conference of the International Graphonomics Society, pp. 206-11 (2005)
    40. Viard-Gaudin, C., Lallican, P.M., Binter, P., Knerr, S.: The ireste on/off (ironoff) dual handwriting database. In: Proceedings of the Fifth International Conference on Document Analysis and Recognition, ICDAR -9, pp. 455-58. IEEE Computer Society, Washington, DC, USA (1999)
    41. Wang, J., Wu, C., Xu, Y.-Q., Shum, H.-Y., Ji, L.: Learning-based cursive handwriting synthesis. In: IWFHR -2: Proceedings of the Eighth International Workshop on Frontiers in Handwriting Recognition (IWFHR-2), pp. 157-62. IEEE Computer Society, Washington, DC, USA (2002)
    42. Zagoris, K., Papamarkos, N., Chamzas, C.: Web document image retrieval system based on word spotting. In: IEEE International Conference on Image Processing, 2006, pp. 477-80, 8-1 October 2006
  • 作者单位:Raid M. Saabni (1)
    Jihad A. El-Sana (2)

    1. Triangle R&D center, Kafr Qari, Israel
    2. Department of Computer Science, BG University in the Negev, Beersheba, Israel
文摘
Developing and maintaining large comprehensive databases for script recognition that include different shapes for each word in the lexicon is expensive and difficult. In this paper, we present an efficient system that automatically generates prototypes for each word in a lexicon using multiple appearances of each letter. Large sets of different shapes are created for each letter in each position. These sets are then used to generate valid shapes for each word-part. The number of valid permutations for each word is large and prohibits practical training and searching for various tasks, such as script recognition and word spotting. We apply dimensionality reduction and clustering techniques to maintain compact representation of these databases, without affecting their ability to represent the wide variety of handwriting styles. In addition, a database for off-line script recognition is generated from the on-line strokes using a standard dilation technique, while making special efforts to resemble pen’s path. We also examined and used several layout techniques for producing words from the generated word-parts. Our experimental results show that the proposed system can automatically generate large databases, whose quality is at least as good as the manually generated ones.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700