Vietnamese Sentence Similarity Based on Concepts
详细信息    查看全文
  • 作者:Hien T. Nguyen (17)
    Phuc H. Duong (17)
    Vinh T. Vo (17)
  • 关键词:Paraphrase Identification ; Text Similarity ; Semantic Similarity
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2014
  • 出版时间:2014
  • 年:2014
  • 卷:8838
  • 期:1
  • 页码:243-253
  • 全文大小:242 KB
  • 参考文献:1. Mihalcea, R., Corley, C., Strapparava, C.: Corpus-based and knowledge-based measures of text semantic similarity. AAAI聽6, 775鈥?80 (2006)
    2. Islam, A., Inkpen, D.: Semantic text similarity using corpus-based word similarity and string similarity. In: ACM Transactions on Knowledge Discovery from Data (TKDD) 2(2), Article 10 (2008)
    3. Sahami, M., Heilman, T.D.: A web-based kernel function for measuring the similarity of short text snippets. In: Proceedings of the 15th International Conference on World Wide Web, pp. 377鈥?86 (2006)
    4. Li, Y., McLean, D., Bandar, Z.A., O鈥檚hea, J.D., Crockett, K.: Sentence similarity based on semantic nets and corpus statistics. IEEE Transactions on Knowledge and Data Engineering聽18(8), 1138鈥?150 (2006) CrossRef
    5. Oliva, J., Serrano, J.I., del Castillo, M.D., Iglesias, 脕.: SyMSS: A syntax-based measure for short-text semantic similarity. Data & Knowledge Engineering聽70(4), 390鈥?05 (2011) CrossRef
    6. Bach, N.X., Minh, N.L., Shimazu, A.: Exploiting discourse information to identify paraphrases. Expert Systems with Applications聽41(6), 2832鈥?841 (2014) CrossRef
    7. Madnani, N., Tetreault, J., Chodorow, M.: Re-examining Machine Translation Metrics for Paraphrase Identification. In: Proceedings of 2012 Conference of the North American Chapter of the Association for Computational Linguistics (NAACL 2012), pp. 182鈥?90 (2012)
    8. Socher, R., Huang, E.H., Pennington, J., Ng, A.Y., Manning, C.D.: Dynamic Pooling and Unfolding Recursive Autoencoders for Paraphrase Detection. NIPS聽24, 801鈥?09 (2011)
    9. Fernando, S., Stevenson, M.: A semantic similarity approach to paraphrase detection. In: Proceedings of the 11th Annual Research Colloquium of the UK Special Interest Group for Computational Linguistics, pp. 45鈥?2 (2008)
    10. Das, D., Smith, N.: Paraphrase identification as probabilistic quasi-synchronous recognition. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP, pp. 468鈥?76 (2009)
    11. Qiu, L., Kan, M.Y., Chua, T.S.: Paraphrase recognition via dissimilarity significance classification. In: Proceedings of the 2006 Conference on Empirical Methods in Natural Language Processing (EMNLP 2006), pp. 18鈥?6 (2006)
    12. Rus, V., McCarthy, P.M., Lintean, M.C., McNamara, D.S., Graesser, A.C.: Paraphrase identification with lexico-syntactic graph subsumption. In: FLAIRS 2008, pp. 201鈥?06 (2008)
    13. Lee, M.C.: A novel sentence similarity measure for semantic-based expert systems. Expert Systems with Applications聽38(5), 6392鈥?399 (2011) CrossRef
    14. Wenyin, L., Quan, X., Feng, M., Qiu, B.: A short text modeling method combining semantic and statistical information. Information Sciences聽180(20), 4031鈥?041 (2010) CrossRef
    15. Blacoe, W., Lapata, M.: A comparison of vector-based representations for semantic composition. In: Proceedings of the 2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning, pp. 546鈥?56 (2012)
    16. Lesk, M.: Automatic sense disambiguation using machine readable dictionaries: how to tell a pine cone from an ice cream cone. In: Proceedings of the 5th Annual International Conference on Systems Documentation, pp. 24鈥?6 (1986)
    17. Tsatsaronis, G., Varlamis, I., Vazirgiannis, M.: Text relatedness based on a word thesaurus. Journal of Artificial Intelligence Research聽37(1), 1鈥?0 (2010)
    18. Rubenstein, H., Goodenough, J.B.: Contextual correlates of synonymy. Communications of the ACM聽8(10), 627鈥?33 (1965) CrossRef
    19. Huynh, H.M., Nguyen, T.T., Cao, T.H.: Using coreference and surrounding contexts for entity linking. In: 2013 IEEE RIVF International Conference on Computing and Communication Technologies, Research, Innovation, and Vision for the Future (RIVF 2013), pp. 1鈥? (2013)
  • 作者单位:Hien T. Nguyen (17)
    Phuc H. Duong (17)
    Vinh T. Vo (17)

    17. Faculty of Information Technology, Ton Duc Thang University, Vietnam
  • ISSN:1611-3349
文摘
We propose a novel method for measuring semantic similarity of two sentences. The originality of the method is the way that it explores the similarity of concepts referred to in the sentences using Wikipedia. The method also exploits Wiktionary to measure word-to-word similarity. The overall semantic similarity is a linear combination of word-to-word similarity, word-order similarity, and concept similarity. We build datasets consisting of 45 Vietnamese sentence pairs and then evaluate the method on these datasets. The results show that in the best cases, concept similarity help improving the performance of our method more than 15% point. The proposed method is language-independent and quite easy to employ. Therefore, one can readily adopt our method to measure semantic similarity for sentences written in other languages.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700