参考文献:1. Dredze, M., McNamee, P., Rao, D., Gerber, A., Finin, T.: Entity disambiguation for knowledge base population. In: Proc. COLING 2010, pp. 277鈥?85 (2010) 2. Bunescu, R.: Using encyclopedic knowledge for named entity disambiguation. In: EACL, pp. 9鈥?6 (2006) 3. Cucerzan, S.: Large-scale named entity disambiguation based on Wikipedia data. In: Proc. EMNLP-CoNLL 2007, pp. 708鈥?16 (June 2007) 4. Gottipati, S., Jiang, J.: Linking entities to a knowledge base with query expansion. In: Proc. EMNLP 2011, pp. 804鈥?13 (2011) 5. Pilz, A., Paa脽, G.: From names to entities using thematic context distance. In: Proc. CIKM 2011, pp. 857鈥?66 (2011) 6. Kozareva, Z., Ravi, S.: Unsupervised name ambiguity resolution using a generative model. In: Proc. EMNLP 2011, pp. 105鈥?12 (2011) 7. Mihalcea, R., Csomai, A.: Wikify!: linking documents to encyclopedic knowledge. In: Proc. CIKM 2007, pp. 233鈥?42 (2007) 8. Medelyan, O., Witten, I.H., Milne, D.: Topic indexing with wikipedia. In: Proc. AAAI 2008 (2008) 9. Milne, D., Witten, I.H.: Learning to link with wikipedia. In: Proc. CIKM 2008, pp. 509鈥?18 (2008) 10. Han, X., Sun, L.: A generative entity-mention model for linking entities with knowledge base. In: Proc. HLT 2011, pp. 945鈥?54 (2011) 11. Zhai, C., Lafferty, J.: A study of smoothing methods for language models applied to information retrieval. ACM Trans. Inf. Syst.聽22(2), 179鈥?14 (2004) CrossRef 12. Berger, A., Lafferty, J.: Information retrieval as statistical translation. In: Proc. SIGIR 1999, pp. 222鈥?29 (1999) 13. Xue, X., Jeon, J., Croft, W.B.: Retrieval models for question and answer archives. In: Proc. SIGIR 2008, pp. 475鈥?82 (2008) 14. Gao, J., He, X., Nie, J.Y.: Clickthrough-based translation models for web search: from word models to phrase models. In: Proc. CIKM 2010, pp. 1139鈥?148 (2010) 15. Lu, Y., Zhai, C., Sundaresan, N.: Rated aspect summarization of short comments. In: Proc. WWW 2009, pp. 131鈥?40 (2009) 16. Kullback, S., Leibler, R.A.: On information and sufficiency. The Annals of Mathematical Statistics聽22(1), 79鈥?6 (1951) CrossRef 17. Rosen-Zvi, M., Griffiths, T., Steyvers, M., Smyth, P.: The author-topic model for authors and documents. In: Proc. UAI 2004, pp. 487鈥?94 (2004) 18. Heng, J., Ralph, G., Hoa, T.D., Kira, G., Joe, E.: Overview of the tac 2010 knowledge base population track. In: Proc. TAC 2010 (2010) 19. McCallum, A.K.: Mallet: A machine learning for language toolkit (2002), http://mallet.cs.umass.edu
作者单位:Jinpeng Wang (21) Wayne Xin Zhao (21) Rui Yan (21) Haitian Wei (22) Jian-Yun Nie (23) Xiaoming Li (21)
21. Department of Computer Science and Technology, Peking University, China 22. School of International Trade and Economics, University of International Business and Economics, China 23. Dpartement d鈥橧nformatique et de Recherche Oprationnelle, Universit de Montral, Montreal, H3C 3J7, Qubec, Canada
ISSN:1611-3349
文摘
In this paper we present a novel approach to disambiguate names based on two different types of semantic information: lexical and thematic. We propose to use translation-based language models to resolve the synonymy problem in every word match, and to use topic-based ranking function to capture rich thematic contexts for names. We test three ranking functions that combine lexical relatedness and thematic relatedness. The experiments on Wikipedia data set and TAC-KBP 2010 data set show that our proposed method is very effective for name disambiguation.