Automatic ranking of retrieval models using retrievability measure
详细信息    查看全文
  • 作者:Shariq Bashir ; Andreas Rauber
  • 关键词:Retrieval models evaluation ; Retrieval bias analysis ; Automatic ranking of retrieval models ; Genetic programming
  • 刊名:Knowledge and Information Systems
  • 出版年:2014
  • 出版时间:October 2014
  • 年:2014
  • 卷:41
  • 期:1
  • 页码:189-221
  • 全文大小:774 KB
  • 参考文献:1. Amitay E, Carmel D, Lempel R, Soffer A (2004) Scaling ir-system evaluation using term relevance sets. In: SIGIR -4: proceedings of the 27th annual international ACM SIGIR conference on research and development in, information retrieval, pp 10-7
    2. Aslam JA, Savell R (2003) On the effectiveness of evaluating retrieval systems in the absence of relevance judgments. In: SIGIR-3: proceedings of the 26th international ACM SIGIR conference on research and development in, information retrieval, pp 361-62
    3. Azzopardi L, Bache R (2010) On the relationship between effectiveness and accessibility. In: SIGIR -0: proceeding of the 33rd annual international ACM SIGIR conference on research and development in information retrieval. Geneva, Switzerland, pp 889-90
    4. Azzopardi L, Owens C (2009) Search engine predilection towards news media providers. In: SIGIR -9: proceedings of the 32nd annual international ACM SIGIR conference on research and development in information retrieval. Boston, MA, USA, pp 774-75
    5. Azzopardi L, Vinay V (2008) Retrievability: an evaluation measure for higher order information access tasks. In: CIKM -8: proceeding of the 17th ACM conference on information and knowledge management. Napa Valley, CA, USA, pp 561-70
    6. Baccini A, Déjean S, Lafage L, Mothe J (2012) How many performance measures to evaluate information retrieval systems? In, Knowledge and Information Systems, volume 30, pp. 693-13. Springer
    7. Bache R, Azzopardi L (2010) Improving access to large patent corpora. In Transactions on Large-Scale Data- and Knowledge-Centered Systems II, volume 2, pages 103-21. Springer
    8. Bashir S, Rauber A (2009a) Analyzing document retrievability in patent retrieval settings. DEXA -9: Proceedings of the 20th International Conference on Database and Expert Systems Applications (Springer). Linz, Austria, pp 753-60
    9. Bashir S, Rauber A (2009b) Improving retrievability of patents with cluster-based pseudo-relevance feedback documents selection. In CIKM -9: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pages 1863-866, Hong Kong, China, November 2-
    10. Bashir S, Rauber A (2010a) Improving retrievability and recall by automatic corpus partitioning. In: Transactions on large-scale data- and knowledge-centered systems II, vol 2. Springer, pp 122-40
    11. Bashir S, Rauber A (2010b) Improving retrievability of patents in prior-art search. In: ECIR -0: 32nd European conference on information retrieval research (Springer). Milton Keynes, UK. Springer, pp 457-70, March 28-1
    12. Bashir S, Rauber A (2011) On the relationship between query characteristics and ir functions retrieval bias. J Am Soc Inf Sci Technol 62(8):1512-532 CrossRef
    13. Callan J, Connell M (2001) Query-based sampling of text databases. ACM Trans Inf Syst (TOIS) J 19(2):97-30 CrossRef
    14. Cao G, Nie J-Y, Gao J, Robertson S (2008) Selecting good expansion terms for pseudo-relevance feedback. In: SIGIR -8: proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, pp 243-50
    15. Chen H (1995) Machine learning for information retrieval: neural networks, symbolic learning, and genetic algorithms. J Am Soc Inf Sci Technol 46(3):194-16 CrossRef
    16. Cronen-Townsend S, Zhou Y, Croft WB (2002) Predicting query performance. In: SIGIR -2: proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, August 11-5. Tampere, Finland, pp 299-06
    17. Cummins R, O’Riordan C (2005) Evolving general term-weighting schemes for information retrieval: tests on larger collections. Artif Intell Rev 24(3-):277-99 CrossRef
    18. Cummins R, O’Riordan C (2009) Learning in a pairwise term-term proximity framework for information retrieval. In: SIGIR -9: proceedings of the 32nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, pp 251-58
    19. Diaz-Aviles E, Nejdl W, Lars S-T (2009) Swarming to rank for information retrieval. In: GECCO -9, proceedings of the 11th annual conference on genetic and evolutionary computation. ACM, New York, NY, USA, pp 9-6
    20. Fan W, Fox EA, Pathak P, Wu H (2004) The effects of fitness functions on genetic programming-based ranking discovery for web search: research articles. J Am Soc Inf Sci Technol 55(7):628-36 CrossRef
    21. Fujii A, Iwayama M, Kando N (2007) Introduction to the special issue on patent processing. Inf Process Manag J 43(5):1149-153 CrossRef
    22. Gastwirth JL (1972) The estimation of the Lorenz curve and Gini index. Rev Econ Stat 54(3):306-16 CrossRef
    23. Hauff C, Hiemstra D, de Jong F, Azzopardi L (2009) Relying on topic subsets for system ranking estimation. In: CIKM -9: proceeding of the 18th ACM conference on information and knowledge management, pp 1859-862
    24. He B, Ounis I (2006) Query performance prediction. Inf Syst J 31(7):585-94 CrossRef
    25. Itoh H (2004) Patent retrieval experiments at ricoh. In: Proceedings of NTCIR -4: NTCIR-4 workshop meeting
    26. Kamps J (2005) Web-centric language models. In: CIKM-5: proceeding of the 14th ACM conference on information and knowledge management. ACM
    27. Koza JR (1992) A genetic approach to the truck backer upper problem and the inter-twined spiral problem. In: Proceedings of IJCNN international joint conference on neural networks, vol IV. IEEE Press, pp 310-18
    28. Kraaij W, Westerveld T (2000) Tno/ut *at trec-9: How different are web documents? In Proceedings of TREC-9, the 9th text retrieval conference
    29. Lawrence S, Giles CL (1999) Accessibility of information on the web. Nature 400:107-09
    30. Losada DE, Azzopardi L (2008) An analysis on document length retrieval trends in language modeling smoothing. Inf Retr J 11(2):109-38 CrossRef
    31. Lupu M, Huang J, Zhu J, Tait J (2009) TREC-CHEM: large scale chemical information retrieval evaluation at TREC. ACM SIGIR Forum 43(2):63-0 CrossRef
    32. Mase H, Matsubayashi T, Ogawa Y, Iwayama M, Oshio T (2005) Proposal of two-stage patent retrieval method considering the claim structure. ACM Trans Asian Lang Inf Process (TALIP) 4(2):190-06 CrossRef
    33. Mowshowitz A, Kawaguchi A (2002) Bias on the web. Commun ACM 45(9):56-0 CrossRef
    34. Nuray R, Can F (2006) Automatic ranking of information retrieval systems using data fusion. Inf Process Manag J 42(3):595-14 CrossRef
    35. Cordon O, Herrera-Viedma E (2003) A review on the application of evolutionary computation to information retrieval. Int J Approx Reason 34(2-):241-64 CrossRef
    36. Robertson SE, Walker S (1994) Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: SIGIR -4: proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval. Dublin, Ireland, pp 232-41
    37. Shinmori A, Okumura M, Marukawa Y, Iwayama M (2003) Patent claim processing for readability: structure analysis and term explanation. In: Proceedings of the ACL-2003 workshop on patent corpus processing, vol 20, pp 56-5
    38. Singhal A (1997) At&t at trec-6. In: The 6th text retrieval conference (TREC6), pp 227-32
    39. Singhal A, Buckley C, Mitra M (1996) Pivoted document length normalization. In: SIGIR -6: proceedings of the 19th annual international ACM SIGIR conference on research and development in, information retrieval. ACM, pp 21-9
    40. Soboroff I, Nicholas C, Cahan P (2001) Ranking retrieval systems without relevance judgments. In: SIGIR -1: proceedings of the 24th annual international ACM SIGIR conference on research and development in, information retrieval, pp 66-3
    41. Spoerri A (2007) Using the structure of overlap between search results to rank retrieval systems without relevance judgments. Inf Process Manag J 43(4):1059-070 CrossRef
    42. Tao T, Zhai C (2007) An exploration of proximity measures in information retrieval. In: SIGIR -7: proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, pp 295-02
    43. Vaughan L, Thelwall M (2004) Search engine coverage bias: evidence and possible causes. Inf Process Manag J 40(4):693-07 CrossRef
    44. Verberne S, van Halteren H, Theijssen D, Raaijmakers S, Boves L (2011) Learning to rank for why-question answering. Inf Retr 14:107-32 CrossRef
    45. Vrajitoru D (1998) Crossover improvement for the genetic algorithm in information retrieval. Inf Process Manag J 34(4):405-15 CrossRef
    46. Lauw WH, Lim E-P, Wang K (2006) Bias and controversy: beyond the statistical deviation. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. Philadelphia, PA, USA, pp 625-30
    47. Wu S, Crestani F (2003) Methods for ranking information retrieval systems without relevance judgments. In: SAC -3: proceedings of the 2003 ACM symposium on applied, computing, pp 811-16
    48. Zhai C (2002) Risk minimization and language modeling in text retrieval. PhD Thesis, Carnegie Mellon University
    49. Zhao J, Yun Y (2009) A proximity language model for information retrieval. In: SIGIR -9: proceedings of the 32nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, pp 291-98
    50. Zhao Y, Scholer F, Tsegay Y (2008) Effective pre-retrieval query performance prediction using similarity and variability evidence. In: ECIR-8: proceedings of the 30th European conference on advances in information retrieval. Glasgow, UK, pp 52-4
  • 作者单位:Shariq Bashir (1)
    Andreas Rauber (1)

    1. Institute of Software Technology and Interactive Systems, Vienna University of Technology, Vienna, Austria
  • ISSN:0219-3116
文摘
Analyzing retrieval model performance using retrievability (maximizing findability of documents) has recently evolved as an important measurement for recall-oriented retrieval applications. Most of the work in this domain is either focused on analyzing retrieval model bias or proposing different retrieval strategies for increasing documents retrievability. However, little is known about the relationship between retrievability and other information retrieval effectiveness measures such as precision, recall, MAP and others. In this study, we analyze the relationship between retrievability and effectiveness measures. Our experiments on TREC chemical retrieval track dataset reveal that these two independent goals of information retrieval, maximizing retrievability of documents and maximizing effectiveness of retrieval models are quite related to each other. This correlation provides an attractive alternative for evaluating, ranking or optimizing retrieval models-effectiveness on a given corpus without requiring any ground truth available (relevance judgments).
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.