Arabic-English Domain Terminology Extraction from Aligned Corpora
详细信息    查看全文
  • 作者:Wiem Lahbib (23)
    Ibrahim Bounhas (23) (24)
    Bilel Elayeb (25) (26)
  • 关键词:Bilingual terminology ; bilingual ontology ; alignment process ; parallel corpora ; terminological research
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2014
  • 出版时间:2014
  • 年:2014
  • 卷:8841
  • 期:1
  • 页码:745-759
  • 全文大小:347 KB
  • 参考文献:1. Bounhas, I., Elayeb, B., Evard, F., Slimani, Y.: ArabOnto: Experimenting a new distributional approach for building arabic ontological resources. International Journal of Metadata, Semantics and Ontologies (IJMSO)聽6(2), 81鈥?5 (2011b) CrossRef
    2. Bounhas, I., Elayeb, B., Evard, F., Slimani, Y.: Organizing contextual knowledge for arabic text disambiguation and terminology extraction. Knowledge Organization聽38(6), 473鈥?90 (2011a)
    3. El Kholy, A., Habash, N.: Orthographic and morphological processing for English-Arabic statistical machine translation. Machine Translation聽26(1-2), 25鈥?5 (2012) CrossRef
    4. Saad, M., Langlois, D., Sma茂li, K.: Cross-Lingual Semantic Similarity Measure for Comparable Articles. In: 9th International Conference on Natural Language Processing, PolTAL 2014 (2014)
    5. Schwenk, H., Yannick, E., Sadaf, A.: The LIUM Arabic/English statistical machine translation system for IWSLT. In: International Workshop on Spoken Language Translation, pp. 63鈥?8 (2008)
    6. Ha, L.A., Fernandez, G., Mitkov, R., Corpas Pastor, G.: Mutual bilingual terminology extraction. In: Proceedings of the 6th Conference on Language Resources and Evaluation (LREC), Marrakesh, Morocco, May 28-30, pp. 1818鈥?824 (2008)
    7. Bouamor, D., Semmar, N., Zweigenbaum, P.: Utilisation de la similarit茅 s茅mantique pour l鈥檈xtraction de lexiques bilingues 脿 partir de corpus comparables. In: Proceedings of TALN (Traitement Automatique des Langues Naturelles), Les Sables d鈥橭lonne, France, pp. 327鈥?38. ATALA (2013)
    8. Bouamor, D., Popescu, A., Semmar, N., Zweigenbaum, P.: Building Specialized Bilingual Lexicons Using Large Scale Background Knowledge. In: Proceedings of the 2013 Conference on Empirical Methods in Natural Language Processing, Seattle, Washington, USA, October 18-21, pp. 479鈥?89 (2013)
    9. Gabrilovich, E., Markovitch, S.: Computing semantic relatedness using Wikipedia-based explicit semantic analysis. In: Proceedings of the 20th International Joint Conference on Artificial Intelligence, IJCAI 2007, pp. 1606鈥?611. Morgan Kaufmann Publishers Inc., San Francisco (2007)
    10. Hazem, A., Morin, E.: Extraction de lexiques bilingues 脿 partir de corpus comparables par combinaison de repr茅sentations contextuelles. In: Actes de la 20e Conf茅rence sur le Traitement Automatique des Langues Naturelles (TALN), pp. 243鈥?56 (2013)
    11. Tamura, A., Watanabe, T., Sumita, E.: Bilingual Lexicon Extraction from Comparable Corpora Using Label Propagation. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, Jeju Island, Korea, July 12-14 (2012)
    12. Weller, M., Gojun, A., Heid, U., Daille, B., Harastani, R.: Simple methods for dealing with term variation and term alignment. In: Proceedings of the 9th International Conference on Terminology and Artificial Intelligence, Paris, France, November 8-10, pp. 87鈥?3 (2011)
    13. Sim玫es, A., Almeida, J.: Bilingual terminology extraction based on translation patterns. Procesamiento del Lenguaje Natural聽41, 281鈥?88 (2008)
    14. Lu, B., Tsou, B.K.: Towards bilingual term extraction in comparable patents. In: Proceedings of the 23rd Pacific Asia Conference on Language, Information and Computation (PACLIC), Hong Kong, December 3-5, pp. 755鈥?62 (2009)
    15. Koehn, P., Hoang, H., Birch, A., Callison-Burch, C., Federico, M., Bertoldi, N., Cowan, B., Shen, W., Moran, M., Zens, R., Dyer, C., Bojar, O., Constantin, A., Herbst, E.: Moses. Open source toolkit for statistical machine translation. In: Proceedings of the 45th Annual Meeting of the Association for Computational Linguistics (ACL), Interactive Poster and Demonstration Sessions, Stroudsburg, PA, USA, June 23-30, pp. 177鈥?80 (2007)
    16. Eijk, P.: Automating the Acquisition of Bilingual Terminology. In: Proc. of 6th Conference of the European Chapter of the Associaiton for Computational Linguistics, EACL 1993, pp. 113鈥?19 (1993)
    17. Sellami, R., Sadat, F., Hadrich Belguith, L.: Extraction de lexiques bilingues 脿 partir de Wikip茅dia. Atelier de Traitement Automatique des Langues Africaines, JEP (conf茅rence Journ茅es d鈥櫭塼udes en Parole)-TALN-RECITAL, Grenoble, France (TALAf 2012: African Language Processing) (June 2012)
    18. Ayed, R., Bounhas, I., Elayeb, B., Evard, F.: Bellamine Ben Saoud N. A Possibilistic Approach for the Automatic Morphological Disambiguation of Arabic Texts. In: Proceedings of 13th International Conference on Software Engineering, Artificial Intelligence, Networking and Parallel/Distributed Computing (SNPD), Kyoto, Japan, August 08-10, pp. 187鈥?94. IEEE Computer Society (2012)
    19. Schmid, H.: Probabilistic Part-of-Speech Tagging Using Decision Trees. In: Proceedings of International Conference on New Methods in Language Processing, Manchester, UK (1994)
    20. Salton, G., Fox, A., Wu, E., Extended, H.: boolean information retrieval. Communications of the ACM聽26(11), 1022鈥?036 (1983) CrossRef
    21. Lafon, P.: Sur la variabilit茅 de la fr茅quence des formes dans un corpus. Mots No1, pp. 127-165
    22. Lahbib, W., Bounhas, I., Elayeb, B., Evrard, F., Slimani, Y.: A hybrid approach for Arabic semantic relation extraction. In: The 26th International Florida Artificial Intelligence Research Society (FLAIRS) Conference, St. Pete Beach, Florida, USA, May 22-24, pp. 315鈥?20 (2013)
    23. Bounhas, I., Elayeb, B., Evard, F., Slimani, Y.: Toward a computer study of the reliability of Arabic stories. Journal of American Society for Information Science and Technology聽61(8), 1686鈥?705 (2010)
    24. Harrag, F., Alothaim, A., Abanmy, A., Alomaigan, F., Alsalehi, S.: Ontology Extraction Approach for Prophetic Narration (Hadith) using Association Rules. International Journal on Islamic Applications in Computer Science And Technology聽1(2), 48鈥?7 (2013)
  • 作者单位:Wiem Lahbib (23)
    Ibrahim Bounhas (23) (24)
    Bilel Elayeb (25) (26)

    23. LISI Laboratory of computer science for industrial systems, Carthage University, Tunisia
    24. Higher Institute of Documentation (ISD), Manouba University, 2010, Tunisia
    25. RIADI Laboratory,The National School of Computer Science (ENSI), Manouba University, 2010, Tunisia
    26. Emirates College of Technology, P.O. Box: 41009, Abu Dhabi, United Arab Emirates
  • ISSN:1611-3349
文摘
The rapid growth of information sources has produced a large amount of electronically stored documents evolving every day. The development of Information Retrieval Systems (IRS) is a response to this growth, which aims to help the user identify relevant information. Recent IRS proposes to guide the user through providing domain knowledge in the form of controlled vocabularies or terminologies and thus, domain ontologies. In this context, it is necessary to develop multilingual termino-ontological resources. This paper proposes a new approach for bilingual domain terminology extraction in Arabic and English languages as a first step in the bilingual domain ontology building, to be exploited in terminological search. The approach uses arabic vocalized texts to reduce ambiguities and the alignment process to extract the english translations. To the best of our knowledge, the process implemented in our approach (morphological analysis, arabic terminology extraction, alignment and extraction of english translations) is the first work in the field of arabic-english bilingual domain terminology extraction. The results of experiments are encouraging showing rates of relevant term extraction from multiple domains and their translations exceeding 89%.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700