Novel harmony search-based algorithms for part-of-speech tagging
详细信息    查看全文
  • 作者:Rana Forsati (1)
    Mehrnoush Shamsfard (1)

    1. Natural Language Processing (NLP) Research Lab.
    ; Faculty of Electrical and Computer Engineering ; Shahid Beheshti University ; G.C. ; Tehran ; Iran
  • 关键词:Natural language processing (NLP) ; Part ; of ; speech (PoS) tagging ; Harmony search algorithm ; Evolutionary algorithms
  • 刊名:Knowledge and Information Systems
  • 出版年:2015
  • 出版时间:March 2015
  • 年:2015
  • 卷:42
  • 期:3
  • 页码:709-736
  • 全文大小:1,781 KB
  • 参考文献:1. Attia M, Rashwan MAA, Al-Badrashiny MASAA (2009) Fassieh (R), a semi-automatic visual interactive tool for morphological, pos-tags, phonetic, and semantic annotation of arabic text corpora. IEEE Trans Audio Speech Lang Process 17:916鈥?25 CrossRef
    2. Baeza-Yate R, Ribeiro BN (1999) Modern information retrieval. ACM Press; Addison-Wesley, New York Harlow, England
    3. DeRose SJ (1988) Grammatical category disambiguation by statistical optimization. Comput Linguist 14:31鈥?9
    4. Francis WN, Kucera H (1979) Manual of information to accompany a standard corpus of present-day edited american english, for use with digital computers. Brown University, Providence
    5. Brants T (2000) TnT: a statistical part-of-speech tagger, presented at the Proceedings of the sixth conference on Applied natural language processing, Seattle, Washington
    6. Alba E, Luque G, Araujo L (2006) Natural language tagging with genetic algorithms. Inform Process Lett 100:173鈥?82 CrossRef
    7. Geem ZW, Tseng CL, Park Y (2005) Proceedings Harmony search for generalized orienteering problem: best touring in china. Adv Nat Comput Pt 3 3612:741鈥?50
    8. Geem ZW (2008) Novel derivative of harmony search algorithm for discrete design variables. Appl Math Comput 199:223鈥?30 CrossRef
    9. Geem ZW (2009) Music-inspired harmony search algorithm : theory and applications, 1st edn. Springer, New York CrossRef
    10. Forsati R, Mahdavi M, Haghighat AT, Ghariniyat A (2008) An efficient algorithm for bandwidth-delay constrained least cost multicast routing. Can Conf Elect Comput Eng CCECE 2008:1641鈥?646
    11. Forsati R, Mahdavi M, Shamsfard M, Meybodi MR (2013) Efficient stochastic algorithms for document clustering. Inform Sci 22:269鈥?91. doi:10.1016/j.ins.2012.07.025
    12. Forsati R, Mahdavi M, Haghigaht AT (2008) Harmony search based algorithms for bandwidth-delay-constrained least-cost multicast routing. Comput Commun 31:2505鈥?519 CrossRef
    13. Mahdavi M, Haghir Chehreghani M, Abolhassani H, Forsati R (2008) Novel meta-heuristic algorithms for clustering web documents. Appl Math Comput 201:441鈥?51 CrossRef
    14. Mirkhani M, Forsati R, Shahri M, Moayedikia A (2013) A novel efficient algorithm for mobile robot localization. Robot Auton Syst 61:920鈥?31 CrossRef
    15. Lee KS, Geem ZW (2005) A new meta-heuristic algorithm for continuous engineering optimization: harmony search theory and practice. Comput Methods Appl Mech Eng 194:3902鈥?933 CrossRef
    16. Forsati R, Shamsfard M, Mojtahedpour P (2010) An Efficient meta heuristic algorithm for POS-tagging. 2010 Fifth International Multi-conference on Computing in the Global Information Technology, Spain
    17. Forney GD (1973) The Viterbi algorithm. Proc IEEE 61:268鈥?78 CrossRef
    18. Brill E (1995) Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput Linguist 21:543鈥?65
    19. Araujo L (2003) Studying the advantages of a messy evolutionary algorithm for natural language tagging, presented at the Proceedings of the 2003 international conference on Genetic and evolutionary computation: PartII, USA.
    20. Teller V (2000) Speech and language processing: An introduction to natural language processing, computational linguistics and speech recognition. Comput Linguistics 26:638鈥?41 CrossRef
    21. Araujo L (2004) Symbiosis of evolutionary techniques and statistical natural language processing. IEEE Trans Evol Comput 8:14鈥?7 CrossRef
    22. Jurafsky D, Martin JH (2009) Speech and language processing : an introduction to natural language processing, computational linguistics, and speech recognition, 2nd edn. Pearson Prentice Hall, Upper Saddle River
    23. van Halteren H, Daelemans W, Zavrel J (2001) Improving accuracy in word class tagging through the combination of machine learning systems. Comput Linguist 27:199鈥?29 CrossRef
    24. Sch眉tze H, Singer Y. (1994). Part-of-speech tagging using a variable memory markov model, presented at the Proceedings of the 32nd annual meeting on Association for Computational Linguistics, Las Cruces, New Mexico.
    25. Merialdo B (1994) Tagging English text with a probabilistic model. Comput Linguist 20:155鈥?71
    26. Araujo L, (2002). Part-of-Speech tagging with evolutionary algorithms, presented at the Proceedings of the Third International Conference on Computational Linguistics and Intelligent Text Processing.
    27. Sarikaya R, Afify M, Deng Y, Erdogan H, Gao Y (2008) Joint morphological-lexical language modeling for processing morphologically rich languages with application to dialectal Arabic. IEEE Trans Audio Speech Lang Process 16:1330鈥?339 CrossRef
    28. Schmid H. (1994). Part-of-speech tagging with neural networks, presented at the Proceedings of the 15th conference on Computational linguistics, vol. 1, Kyoto, Japan.
    29. Kudo T, Yamamoto K, Matsumoto Y. (2004). 鈥淎pplying conditional random fields to Japanese morphological analysis鈥? presented at the In Proc. of EMNLP鈥?4, Barcelona, Spain.
    30. Manning CD, Sch眉tze H (1999) Foundations of statistical natural language processing. MIT Press, Cambridge
    31. Charniak E (1993) Statistical language learning. MIT Press, Cambridge
    32. Halteren HV, Zavrel J, Daelemans W. (1998). Improving data driven wordclass tagging by system combination, presented at the Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics and 17th International Conference on Computational Linguistics, vol. 1, Montreal, Quebec, Canada.
    33. Martin Volk GS. (1998). Comparing a statistical and a rule-based tagger for German, presented at the In Proceedings of KONVENS-98, Bonn.
    34. Brill E (1995) Transformation-based error-driven learning and natural language processing: A case study in part-of-speech tagging. Comput Linguistics 21(4):543鈥?65
    35. Lua KT (1996) Part of speech tagging of Chinese sentences using genetic algorithm, presented at the Conference on Chinese Computing.
    36. Araujo L (2003) Studying the advantages of a messy evolutionary algorithm for natural language tagging. Genetic and Evolutionary Computation - Gecco 2003, Pt Ii, Proceedings 2724:1951鈥?962 CrossRef
    37. Jelinek F (1985) Markov source modeling of text generation. Skwirzynski JK (ed) The Impact of Processing Techniques on Communication. Nijhoff, Dordrecht, The, Netherlands
    38. Carlberger J, Kann V (1999) Implementing an efficient part-of-speech tagger. Softw. Pract. Exper. 29:815鈥?32 CrossRef
    39. Lee GG, Cha J, Lee JH (2002) Syllable-pattern-based unknown-morpheme segmentation and estimation for hybrid part-of-speech tagging of Korean. Comput. Linguist. 28:53鈥?0 CrossRef
    40. Pan QK, Suganthan PN, Tasgetiren MF (2010) A local-best harmony search algorithm with dynamic subpopulations. Engineering Optimization 42:101鈥?17 CrossRef
    41. Goldberg DE, Deb K (1991) A comparative analysis of selection schemes used in genetic algorithms. Morgan Kaufmann Publishers Inc, Burlington
    42. [RSOnline] http://khnt.hit.uib.no/icame/manuals/brown/INDEX.HTM
    43. Araujo L, Luque G, Alba E (2004) Metaheuristics for natural language tagging. In: Deb K et al (eds) Genetic and Evolutionary Computation Conference (GECCO-2004) Seattle, Washington, in: Lecture Notes in Computer Science, vol 3102., Springer, Berlin, pp 889鈥?00
    44. Marcus MP, Santorini B, Marcinkiewicz MA (1994) Building a large annotated corpus of English: The Penn Treebank, Computational Linguistics 19:313鈥?30
    45. Geem ZW (2006) Optimal cost design of water distribution networks using harmony search. Engineering Optimization 38:259鈥?77 CrossRef
    46. Kim JH, Geem ZW, Kim ES (2001) Parameter estimation of the nonlinear muskingum model using harmony search. J. Am. Water Resour. Assoc. 37:1131鈥?138 CrossRef
    47. Forsati R, Mahdavi M (2010) Web text mining using harmony search. Recent Advances In Harmony Search Algorithm 2010:51鈥?4 CrossRef
    48. Rosenfeld R (1996) A maximum entropy approach to adaptive statistical language modeling. Computer Speech and Language 10:187鈥?28 CrossRef
    49. Aone C, Hausman K. (1996). 鈥淯nsupervised learning of a rule-based Spanish part of speech tagger鈥? presented at the Proceedings of the 16th conference on Computational linguistics, vol. 1, Copenhagen, Denmark.
    50. Daelemans W, Zavrel J, Berck P, Gillis S. (1996) 鈥淢BT: A memory-based part-of speech tagger generator鈥? Proceedings 4th Workshop on Very Large Corpora, pp. 14鈥?7.
    51. Gao J, Johnson M. (2008) 鈥淎 comparison of Bayesian estimators for unsupervised hidden Markov model POS taggers鈥? 2008 Conference on Empirical Methods in Natural Language Processing, EMNLP, pp. 344鈥?52.
    52. Gamback B, Olsson F, Argaw AA, Asker L. (2009). 鈥淢ethods for amharic part-of-speech tagging鈥? Proceedings of the First Workshop on Language Technologies for African Languages (AfLaT 2009), Greece: Association for Computational Linguistics, pp. 104鈥?11.
    53. Gim茅nez J, M脿rquez L. (2004). 鈥淪VMTool: A general POS tagger generator based on Support Vector Machines鈥? In Proceedings of the 4th International Conference on Language Resources and Evaluation, pp. 43鈥?6, Lisbon, Portugal.
  • 刊物类别:Computer Science
  • 刊物主题:Information Systems and Communication Service
    Business Information Systems
  • 出版者:Springer London
  • ISSN:0219-3116
文摘
As a fast and high-quality tagger algorithm is a crucial task in natural language processing, this paper presents novel language-independent algorithms based on harmony search (HS) optimization method for handling the part-of-speech (PoS) tagging problem. The first proposed algorithm is a framework for applying HS to PoS-tagging which is called HSTAGger. By modifying HS algorithm and proposing more efficient objective functions, two improved versions of the HSTAGger are also introduced. In addition, a novel class of problematic words called erroneous as well as a method of handling them is proposed for the first time to the best of our knowledge. To demonstrate the effectiveness of the proposed algorithms, we have applied them on standard annotated corpus and compare them with other evolutionary-based and classical PoS-tagging approaches. Experimental results indicate that the proposed algorithms outperform the other taggers previously presented in the literature in terms of average precision.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700