| |
Automatic ranking of retrieval models using retrievability measure
- 作者:Shariq Bashir ; Andreas Rauber
- 关键词:Retrieval models evaluation ; Retrieval bias analysis ; Automatic ranking of retrieval models ; Genetic programming
- 刊名:Knowledge and Information Systems
- 出版年:2014
- 出版时间:October 2014
- 年:2014
- 卷:41
- 期:1
- 页码:189-221
- 全文大小:774 KB
- 参考文献:1. Amitay E, Carmel D, Lempel R, Soffer A (2004) Scaling ir-system evaluation using term relevance sets. In: SIGIR -4: proceedings of the 27th annual international ACM SIGIR conference on research and development in, information retrieval, pp 10-7
2. Aslam JA, Savell R (2003) On the effectiveness of evaluating retrieval systems in the absence of relevance judgments. In: SIGIR-3: proceedings of the 26th international ACM SIGIR conference on research and development in, information retrieval, pp 361-62 3. Azzopardi L, Bache R (2010) On the relationship between effectiveness and accessibility. In: SIGIR -0: proceeding of the 33rd annual international ACM SIGIR conference on research and development in information retrieval. Geneva, Switzerland, pp 889-90 4. Azzopardi L, Owens C (2009) Search engine predilection towards news media providers. In: SIGIR -9: proceedings of the 32nd annual international ACM SIGIR conference on research and development in information retrieval. Boston, MA, USA, pp 774-75 5. Azzopardi L, Vinay V (2008) Retrievability: an evaluation measure for higher order information access tasks. In: CIKM -8: proceeding of the 17th ACM conference on information and knowledge management. Napa Valley, CA, USA, pp 561-70 6. Baccini A, Déjean S, Lafage L, Mothe J (2012) How many performance measures to evaluate information retrieval systems? In, Knowledge and Information Systems, volume 30, pp. 693-13. Springer 7. Bache R, Azzopardi L (2010) Improving access to large patent corpora. In Transactions on Large-Scale Data- and Knowledge-Centered Systems II, volume 2, pages 103-21. Springer 8. Bashir S, Rauber A (2009a) Analyzing document retrievability in patent retrieval settings. DEXA -9: Proceedings of the 20th International Conference on Database and Expert Systems Applications (Springer). Linz, Austria, pp 753-60 9. Bashir S, Rauber A (2009b) Improving retrievability of patents with cluster-based pseudo-relevance feedback documents selection. In CIKM -9: Proceedings of the 18th ACM Conference on Information and Knowledge Management, pages 1863-866, Hong Kong, China, November 2- 10. Bashir S, Rauber A (2010a) Improving retrievability and recall by automatic corpus partitioning. In: Transactions on large-scale data- and knowledge-centered systems II, vol 2. Springer, pp 122-40 11. Bashir S, Rauber A (2010b) Improving retrievability of patents in prior-art search. In: ECIR -0: 32nd European conference on information retrieval research (Springer). Milton Keynes, UK. Springer, pp 457-70, March 28-1 12. Bashir S, Rauber A (2011) On the relationship between query characteristics and ir functions retrieval bias. J Am Soc Inf Sci Technol 62(8):1512-532 CrossRef 13. Callan J, Connell M (2001) Query-based sampling of text databases. ACM Trans Inf Syst (TOIS) J 19(2):97-30 CrossRef 14. Cao G, Nie J-Y, Gao J, Robertson S (2008) Selecting good expansion terms for pseudo-relevance feedback. In: SIGIR -8: proceedings of the 31st annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, pp 243-50 15. Chen H (1995) Machine learning for information retrieval: neural networks, symbolic learning, and genetic algorithms. J Am Soc Inf Sci Technol 46(3):194-16 CrossRef 16. Cronen-Townsend S, Zhou Y, Croft WB (2002) Predicting query performance. In: SIGIR -2: proceedings of the 25th annual international ACM SIGIR conference on research and development in information retrieval, August 11-5. Tampere, Finland, pp 299-06 17. Cummins R, O’Riordan C (2005) Evolving general term-weighting schemes for information retrieval: tests on larger collections. Artif Intell Rev 24(3-):277-99 CrossRef 18. Cummins R, O’Riordan C (2009) Learning in a pairwise term-term proximity framework for information retrieval. In: SIGIR -9: proceedings of the 32nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, pp 251-58 19. Diaz-Aviles E, Nejdl W, Lars S-T (2009) Swarming to rank for information retrieval. In: GECCO -9, proceedings of the 11th annual conference on genetic and evolutionary computation. ACM, New York, NY, USA, pp 9-6 20. Fan W, Fox EA, Pathak P, Wu H (2004) The effects of fitness functions on genetic programming-based ranking discovery for web search: research articles. J Am Soc Inf Sci Technol 55(7):628-36 CrossRef 21. Fujii A, Iwayama M, Kando N (2007) Introduction to the special issue on patent processing. Inf Process Manag J 43(5):1149-153 CrossRef 22. Gastwirth JL (1972) The estimation of the Lorenz curve and Gini index. Rev Econ Stat 54(3):306-16 CrossRef 23. Hauff C, Hiemstra D, de Jong F, Azzopardi L (2009) Relying on topic subsets for system ranking estimation. In: CIKM -9: proceeding of the 18th ACM conference on information and knowledge management, pp 1859-862 24. He B, Ounis I (2006) Query performance prediction. Inf Syst J 31(7):585-94 CrossRef 25. Itoh H (2004) Patent retrieval experiments at ricoh. In: Proceedings of NTCIR -4: NTCIR-4 workshop meeting 26. Kamps J (2005) Web-centric language models. In: CIKM-5: proceeding of the 14th ACM conference on information and knowledge management. ACM 27. Koza JR (1992) A genetic approach to the truck backer upper problem and the inter-twined spiral problem. In: Proceedings of IJCNN international joint conference on neural networks, vol IV. IEEE Press, pp 310-18 28. Kraaij W, Westerveld T (2000) Tno/ut *at trec-9: How different are web documents? In Proceedings of TREC-9, the 9th text retrieval conference 29. Lawrence S, Giles CL (1999) Accessibility of information on the web. Nature 400:107-09 30. Losada DE, Azzopardi L (2008) An analysis on document length retrieval trends in language modeling smoothing. Inf Retr J 11(2):109-38 CrossRef 31. Lupu M, Huang J, Zhu J, Tait J (2009) TREC-CHEM: large scale chemical information retrieval evaluation at TREC. ACM SIGIR Forum 43(2):63-0 CrossRef 32. Mase H, Matsubayashi T, Ogawa Y, Iwayama M, Oshio T (2005) Proposal of two-stage patent retrieval method considering the claim structure. ACM Trans Asian Lang Inf Process (TALIP) 4(2):190-06 CrossRef 33. Mowshowitz A, Kawaguchi A (2002) Bias on the web. Commun ACM 45(9):56-0 CrossRef 34. Nuray R, Can F (2006) Automatic ranking of information retrieval systems using data fusion. Inf Process Manag J 42(3):595-14 CrossRef 35. Cordon O, Herrera-Viedma E (2003) A review on the application of evolutionary computation to information retrieval. Int J Approx Reason 34(2-):241-64 CrossRef 36. Robertson SE, Walker S (1994) Some simple effective approximations to the 2-poisson model for probabilistic weighted retrieval. In: SIGIR -4: proceedings of the 17th annual international ACM SIGIR conference on research and development in information retrieval. Dublin, Ireland, pp 232-41 37. Shinmori A, Okumura M, Marukawa Y, Iwayama M (2003) Patent claim processing for readability: structure analysis and term explanation. In: Proceedings of the ACL-2003 workshop on patent corpus processing, vol 20, pp 56-5 38. Singhal A (1997) At&t at trec-6. In: The 6th text retrieval conference (TREC6), pp 227-32 39. Singhal A, Buckley C, Mitra M (1996) Pivoted document length normalization. In: SIGIR -6: proceedings of the 19th annual international ACM SIGIR conference on research and development in, information retrieval. ACM, pp 21-9 40. Soboroff I, Nicholas C, Cahan P (2001) Ranking retrieval systems without relevance judgments. In: SIGIR -1: proceedings of the 24th annual international ACM SIGIR conference on research and development in, information retrieval, pp 66-3 41. Spoerri A (2007) Using the structure of overlap between search results to rank retrieval systems without relevance judgments. Inf Process Manag J 43(4):1059-070 CrossRef 42. Tao T, Zhai C (2007) An exploration of proximity measures in information retrieval. In: SIGIR -7: proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, pp 295-02 43. Vaughan L, Thelwall M (2004) Search engine coverage bias: evidence and possible causes. Inf Process Manag J 40(4):693-07 CrossRef 44. Verberne S, van Halteren H, Theijssen D, Raaijmakers S, Boves L (2011) Learning to rank for why-question answering. Inf Retr 14:107-32 CrossRef 45. Vrajitoru D (1998) Crossover improvement for the genetic algorithm in information retrieval. Inf Process Manag J 34(4):405-15 CrossRef 46. Lauw WH, Lim E-P, Wang K (2006) Bias and controversy: beyond the statistical deviation. In: Proceedings of the 12th ACM SIGKDD international conference on knowledge discovery and data mining. Philadelphia, PA, USA, pp 625-30 47. Wu S, Crestani F (2003) Methods for ranking information retrieval systems without relevance judgments. In: SAC -3: proceedings of the 2003 ACM symposium on applied, computing, pp 811-16 48. Zhai C (2002) Risk minimization and language modeling in text retrieval. PhD Thesis, Carnegie Mellon University 49. Zhao J, Yun Y (2009) A proximity language model for information retrieval. In: SIGIR -9: proceedings of the 32nd annual international ACM SIGIR conference on research and development in information retrieval. ACM, New York, NY, USA, pp 291-98 50. Zhao Y, Scholer F, Tsegay Y (2008) Effective pre-retrieval query performance prediction using similarity and variability evidence. In: ECIR-8: proceedings of the 30th European conference on advances in information retrieval. Glasgow, UK, pp 52-4
- 作者单位:Shariq Bashir (1)
Andreas Rauber (1)
1. Institute of Software Technology and Interactive Systems, Vienna University of Technology, Vienna, Austria
- ISSN:0219-3116
文摘
Analyzing retrieval model performance using retrievability (maximizing findability of documents) has recently evolved as an important measurement for recall-oriented retrieval applications. Most of the work in this domain is either focused on analyzing retrieval model bias or proposing different retrieval strategies for increasing documents retrievability. However, little is known about the relationship between retrievability and other information retrieval effectiveness measures such as precision, recall, MAP and others. In this study, we analyze the relationship between retrievability and effectiveness measures. Our experiments on TREC chemical retrieval track dataset reveal that these two independent goals of information retrieval, maximizing retrievability of documents and maximizing effectiveness of retrieval models are quite related to each other. This correlation provides an attractive alternative for evaluating, ranking or optimizing retrieval models-effectiveness on a given corpus without requiring any ground truth available (relevance judgments).
| |
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.
| |