Effective Retrieval Model for Entity with Multi-valued Attributes: BM25MF and Beyond
详细信息    查看全文
  • 作者:Stéphane Campinas
    Renaud Delbru
    Giovanni Tummarello
  • 关键词:RDF ; Entity Retrieval ; Search ; Ranking ; Semi ; Structured Data ; BM25 ; BM25F ; BM25MF ; PL2 ; PL2F ; PL2MF
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2012
  • 出版时间:2012
  • 年:2012
  • 卷:7603
  • 期:1
  • 页码:216-226
  • 全文大小:343KB
  • 参考文献:1. Cafarella, M.J., Halevy, A., Madhavan, J.: Structured Data on the Web. Communications of the ACM?54(2), 72 (2011) CrossRef
    2. Balog, K., Serdyukov, P., de Vries, A.P.: Overview of the TREC 2010 Entity Track. In: Proceedings of the Nineteenth Text REtrieval Conference (TREC 2010), NIST (2011)
    3. Demartini, G., Iofciu, T., de Vries, A.P.: Overview of the INEX 2009 Entity Ranking Track. In: Geva, S., Kamps, J., Trotman, A. (eds.) INEX 2009. LNCS, vol.?6203, pp. 254-64. Springer, Heidelberg (2010) CrossRef
    4. Tran, T., Mika, P., Wang, H., Grobelnik, M.: Semsearch-1: the 4th semantic search workshop. In: Srinivasan, S., Ramamritham, K., Kumar, A., Ravindra, M.P., Bertino, E., Kumar, R. (eds.) Proceedings of the 20th International Conference on World Wide Web, WWW 2011, Hyderabad, India (Companion Volume), March 28-April 1, pp. 315-16. ACM (2011)
    5. Blanco, R., Halpin, H., Herzig, D.M., Mika, P., Pound, J., Thompson, H.S., Tran, D.T.: Entity search evaluation over structured web data. In: Proceedings of the 1st International Workshop on Entity-Oriented Search at SIGIR 2011, Beijing, PR China (Juli 2011)
    6. Pound, J., Mika, P., Zaragoza, H.: Ad-hoc object retrieval in the web of data. In: Proceedings of the 19th International Conference on World Wide Web, pp. 771-80. ACM Press, New York (2010) CrossRef
    7. Zaragoza, H., Craswell, N., Taylor, M.J., Saria, S., Robertson, S.E.: Microsoft Cambridge at TREC 13: Web and Hard Tracks. In: TREC 2004, p. 1- (2004)
    8. Macdonald, C., Plachouras, V., He, B., Lioma, C., Ounis, I.: University of Glasgow at WebCLEF 2005: Experiments in Per-Field Normalisation and Language Specific Stemming. In: Peters, C., Gey, F.C., Gonzalo, J., Müller, H., Jones, G.J.F., Kluck, M., Magnini, B., de Rijke, M., Giampiccolo, D. (eds.) CLEF 2005. LNCS, vol.?4022, pp. 898-07. Springer, Heidelberg (2006) CrossRef
    9. Robertson, S., Zaragoza, H.: The Probabilistic Relevance Framework: BM25 and Beyond. Found. Trends Inf. Retr.?3, 333-89 (2009) CrossRef
    10. Amati, G., Van Rijsbergen, C.J.: Probabilistic models of information retrieval based on measuring the divergence from randomness. ACM Trans. Inf. Syst.?20(4), 357-89 (2002) CrossRef
    11. Abiteboul, S.: Querying Semi-Structured Data. In: Afrati, F.N., Kolaitis, P.G. (eds.) ICDT 1997. LNCS, vol.?1186, pp. 1-8. Springer, Heidelberg (1996) CrossRef
    12. Klyne, G., Carroll, J.J.: Resource Description Framework (RDF): Concepts and Abstract Syntax. Changes?10, 1-0 (2004)
    13. Delbru, R., Campinas, S., Tummarello, G.: Searching Web Data: an Entity Retrieval and High-Performance Indexing Model. Web Semantics: Science, Services and Agents on the World Wide Web 10(0) (2012)
    14. Pérez-Agüera, J.R., Arroyo, J., Greenberg, J., Iglesias, J.P., Fresno, V.: Using BM25F for semantic search. In: Proceedings of the 3rd International Semantic Search Workshop, SEMSEARCH 2010, pp. 2:1-:8. ACM, New York (2010)
    15. Blanco, R., Mika, P., Vigna, S.: Effective and Efficient Entity Search in RDF Data. In: Aroyo, L., Welty, C., Alani, H., Taylor, J., Bernstein, A., Kagal, L., Noy, N., Blomqvist, E. (eds.) ISWC 2011, Part I. LNCS, vol.?7031, pp. 83-7. Springer, Heidelberg (2011) CrossRef
    16. Harter, S.: A probabilistic approach to automatic keyword indexing. PhD thesis, The University of Chicago (1974)
    17. Robertson, S.E., van Rijsbergen, C.J., Porter, M.F.: Probabilistic models of indexing and searching. In: Proceedings of the 3rd Annual ACM Conference on Research and Development in Information Retrieval, pp. 35-6. Butterworth & Co, Kent (1981)
    18. Robertson, S., Zaragoza, H., Taylor, M.: Simple BM25 extension to multiple weighted fields. In: Proceedings of the Thirteenth ACM International Conference on Information and Knowledge Management, CIKM 2004, pp. 42-9. ACM, New York (2004) CrossRef
    19. Robertson, S.E., Walker, S.: Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 1994, pp. 232-41. Springer-Verlag New York, Inc., New York (1994)
    20. Hu, X., Eberhart, R.: Solving Constrained Nonlinear Optimization Problems with Particle Swarm Optimization. In: 6th World Multiconference on Systemics, Cybernetics and Informatics (SCI 2002), pp. 203-06 (2002)
    21. Sheskin, D.J., Hall, C.: Handbook of Parametric and Nonparametric Statistical Procedures, 3rd edn. CRC (2003)
    22. Büttcher, S., Clarke, C., Cormack, G.V.: Information Retrieval: Implementing and Evaluating Search Engines. The MIT Press (2010)
  • 作者单位:Stéphane Campinas
    Renaud Delbru
    Giovanni Tummarello

文摘
The task of entity retrieval becomes increasingly prevalent as more and more structured information about entities is available on the Web in various forms such as documents embedding metadata (RDF, RDFa, Microdata, Microformats). International benchmarking campaigns, e.g., the Text REtrieval Conference or the Semantic Search Challenge, propose entity-oriented search tracks. This reflects the need for an effective search and discovery of entities. In this work, we present a multi-valued attributes model for entity retrieval which extends and generalises existing field-based ranking models. Our model introduces the concept of multi-valued attributes and enables attribute and value-specific normalization and weighting. Based on this model we extend two state-of-the-art field-based rankings, i.e., BM25F and PL2F, and demonstrate based on evaluations over heterogeneous datasets that this model improves significantly the retrieval performance compared to existing models. Finally, we introduce query dependent and independent weights specifically designed for our model which provide significant performance improvement.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700