WWW上链接分析算法的若干研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
WWW的出现对传统的信息检索技术提出了挑战,在传统的信息检索技术没有突破性进展的现状下,从Web数据本身的特点出发,充分地挖掘Web上最充足的资源——超链接,通过超链接进行搜索,建立有效的Web信息检索的模型,找到我们需要的信息,本文正是本着这样一个前提,对页面的链接分析算法作了深入细致的研究,从理论,算法和应用三个层次上,发掘超链接在Web检索方面的作用,主要包括以下几个方面:
     首先,在对当前已有的链接算法进行分析和实现的过程中我们发现:基于不同的数据环境和检索要求,对不同类型的链接,算法所采用的预处理方法、迭代规则和迭代的终止条件都会影响查询的结果。提出对于封闭数据集合链接分析算法的约束条件,通过对比封闭数据集合和实际的Web环境中的超链接的分布,将这些约束扩展到实际Web环境中,更准确地预测链接分析算法的作用;实验表明在此约束条件下,链接分析算法能够有效地提高检索效率。
     其次优化与查询无关的事前链接分析算法,得到优化的事前链接分析算法Modilink(),该算法给出了超链接的预处理方法,调整的归一化方法,完备的迭代终止判定规则,实验表明该算法可以从整体上提高算法的迭代效率。
     提出了基于页面质量因素扩展的与查询相关的事后链接分析算法QHA1(quality based hyperlink analysis algorithm),该算法将算法Modilink()得到的结果作为评价页面质量的因素引入超链接的权值指定算法中,使超链接能够比较客观地反映所链接的页面之间互相影响的程度:此外,将超链接的来源也考虑到超链接的权值指定上,结合页面质量因素提出另外一个优化的事后链接分析算法QHA2。对于优化的事后链接分析算法我们从理论上证明了算法的正确性和可行性,并在实验中验证了这些算法。
     借鉴潜在语义分析中的方法,本文将矩阵奇异值分解引入事后链接分析算法中,提出基于SVD分解的滤噪算法,运用矩阵的奇异值分解的方法进行无关页面和超链接的滤噪,并将其应用于与查询相关的事后链接分析算法的初始基本集合的构造;提出了优化的事后链接分析算法QHA3,QHA4,算法有效地控制了主题漂移现象的产生,为准确的查找提供了一个很好的途径。
The emergence of WWW introduced new challenges to the traditional information retrieval (IR) technologies. Web searching involves in the theories and technologies of applied mathematics theory (such as graph theory, matrix theory and analysis), data mining, AI, NLP, etc. The core of the search engine technology is to find a better searching algorithm. From the characteristics of the Web data, hyperlinks among the web pages can be used to mine more useful information. Searching with the hyperlinks can create more effective Web information retrieval model. This dissertation studies how hyperlinks affect the Web IR theories, algorithms and applications.
    First, by comparing the hyperlink analysis algorithms against different data environment and retrieval requirements, I analyzed how the search results are affected by the methods to process different types of link and the methods to set the iteration rules and terminating conditions. Then I proposed restricting conditions for the hyperlink analysis algorithms in closed data set. By comparing the hyperlink distributions of the closed data set and the real Web environments, I expanded the restricting conditions to the real Web environments. In this way the effect of the algorithm can be predicated quantitatively and the experiment results show that the retrieval efficiency can be improved greatly.
    Then, new optimized hyperlink analysis algorithms are proposed. One of them is the Modilink. This query-independent approach introduced new preprocessing algorithms adjusting standardization methods and iterative terminating conditions. It also modified the iterative formula of PageRank algorithm to improve the whole iterative efficiency of the algorithm. The experiment results show that the Modilink can convergence faster than the PageRank algorithm and under the restricting conditions the retrieval efficiency can be improved.
    Other optimized hyperlink analysis algorithms are relative to the queries. Considering relationship between the web page quality and the characteristics of the hyperlink analysis algorithm, I proposed QHA1, a quality based hyperlink analysis algorithm. The core of this algorithm is to take the value from the Modilink as the web page quality factor in the
引文
[Adamic 1999(1)] L. Adamic and B. Huberman. "The nature of markets on the World Wide Web", Xerox PARC Technical Report, 1999
    [Adamic 1999(2)] L. Adamic and B. Huberman. "Scaling behavior on the World Wide Web", Technical comment on Barabasi and Albert 99.
    [Agrawal 2002] Rakesh Agrawal, Ramakrisbnan Srikant "Searching with Numbers" WWW2002, May 7-11, 2002, Honolulu, Hawaii, USA. ACM P420-431
    [Aiello 2000] W. Aiello, F. Chung and L. Lu. "A random graph model for massive graphs", ACM Symposium on the Theory and Computing 2000.
    [Albert 1999] R. Albert, H. Jeong, and A.L. Barabasi. "Diameter of the World Wide Web", Nature 401:130-131, Sep 1999.
    [Allan 2001] Allan Borodin , Gareth O. Roberts, Jeffrey S. Rosenthal , and Panayiotis Tsaparas "Finding Authorities and Hubs From Link Structures on the World Wide Web" WWW10, May 1-5, 2001, Hong Kong. ACM 1-58113-348-0/01/0005,P415-429
    [Amento 2000] B. Amento, L. Terveen, and W. Hill, "Does authority mean quality? predicting expert quality ratings of Web documents," Proceedings of the 23rd International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 296-303, 2000.
    [Amit 2001] Amit Singhal, Marcin Kaszkiel "A Case Study in Web Search using TREC Algorithms" WWW10, May 1-5, 2001, Hong Kong. ACM 1-58113-348-0/01/0005.P708-716
    [Baeza-Yates 1999] R. Baeza-Yates and B. Ribeiro-Neto. Modern Infor-mation Retrieval. Addison Wesley, Essex, England, 1999.513 pages.
    [Barabasi 1999] A. Barabasi and R. Albert. "Emergence of scaling in random networks", Science, 286(509), 1999.
    [Barford 1999] P. Barford, A. Bestavros, A. Bradley, and M. E. Crovella. "Changes in Web client access patterns: Characteristics and caching implications", World Wide Web, Special Issue on Characterization and Performance Evaluation, 1999.
    [Berners-Lee 1994] Tim Berners-Lee, R. Caillian, A. Lautonen, H. F. Nielsen. A. Secret, "The World Wide Web", Communication of the ACM, 37(8), pp 76-82, 1996.
    [Berthicr 1996] Berthier A.Ribeiro-Neto and Richard R.Muntz. "A belief network model for 1R", Proceedings of the 19~(th) Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR'96, August 8-22,1996,Zurich, Switzerland,page 253-260.ACM,1996
    [Bharat 1998(1)] K. Bharat, and M. Henzinger. "Improved algorithms for topic distillation in hyperlinked environments", Proceedings of the 21st ACM SIGIR Conference on Research and Developments in Information Retrieval, pp. 104-111, 1998.
    [Bharat 1998(2)] K.Bharat,A.Broder, M.Henzinger, P.Kumar and S.Venkatasubramanian. "The Connectivity Server:Fast Access to Linkage Information on the Web". Proc, of the 7th World Wide Web Conference,pp.469-477,
    [Bichteler 1980] J. Bichteler and E. A. Eaton III. "The combined use of bibliographic coupling and cocitation for document retrieval". Journal of the American Society for Information Science, 31(7):278-282, 1980.
    [Bond 1976] J.A.Bondy, U.S.R.Murty "GRAPH THEORY WITH APPLICATIONS" The Macmillan Press LTD, 1976
    [Brin 1998] S. Brin and L. Page. "The anatomy of a large-scale hypertextual web search engine", Proc, of the 7th International World Wide Web Conference (WWW7), pages 107-117, Brisbane, Australia, 1998.
    
    [Broder 2000] Andrei Broder, Ravi Kumar, Farzin Maghoul, etc. "Graph structure in the web", in proc of the 9th WWW conference, 2000.
    [Broglio 1995] J. Broglio, J.P. Crdlan, W.B.Croft, D. W. Nachbar (1995). "Document Retrieval and Routing using the INQUERY System". Overview of the Third Retrieval Conference (TREC-3), NIST Special Publication 500-225, edited by D.K. Harman, pages 29-38.
    [Butafogo 1991] R.A. Butafogo and B. Schneiderman. "Identifying aggregates in hypertext structures", Proc. 3rd ACM Conference on Hypertext, 1991.
    [Carriere 1997] J. Carriere, and R. Kazman. "WebQuery: Searching and visualizing the Web through connectivity", Proc. 6th WWW, 1997.
    [Chakrabarti 1998(1)] S. Chakrabarti, B. Dora, P. Raghavan, S.jagopalan, D. Gibson, and J. Keinberg. "Automatic resource compilation by analyzing hyperlink structure and associated text". In Proc. of the 7th International World Wide Web Conference (WWWT)pages 65-74, Brisbane, Australia, 1998.
    [Chakrabarti 1998(2)] S. Chakrabarti, B. Dora, D. Gibson, S. Ravi Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. "Experiments in topic distillation", Proc. ACM SIGIR workshop on Hypertext Information Retrieval on the Web, 1998.
    [Chakrabarti 1999 (1)] S. Chakrabarti, D. Gibson, and K. McCurley. "Surfing the Web backwards", Proc. 8th WWW, 1999.
    [Cho 2000] J. Cho, H. Garcia-Molina. "Synchronizing a database to Improve Freshness". ACM International Conference on Management of Data (SIGMOD), May 2000.
    [CLEVER 1999] CLEVER项目组成员,“网络的超搜索”,科学,pp7-12,1999,10。
    [Cui 2002] Hang Cui, Ji-Rong Wen, Jian-Yun Nie, Wei-Ying Ma "Probabilistic Query Expansion Using Query Logs" WWW2002, May 7-11, 2002, Honolulu, Hawaii, USA. ACM P325-332
    [David 1998] David Gibson, Jon Kleinberg, Prabhakar Raghavan. "Inferring web communities from link topology". In Proc. 9th ACM conference on hypertext and hypermedia 1998
    [Dean 2000] J.Dean and M.R.Henzinger "Finding related pages in the world wide web" In proc.The International World wide Web Conferenee 2000.
    [Faloutsos 1999] M. Faloutsos, P. Faloutsos, and C. Faloutsos. "On power law relationships of the internet topology", ACM SIGCOMM, 1999.
    [Fine 1973 ] T.L. Fine. "Theories of Probability: An Examination of Foundations". Academic Press, New York.
    [Frakes 1992] W. Frakes and R. Baeza Yates, "'Information Retrieval: Data Structures Algorithms". Prentice Hall, Upper Saddle Rive, N J, 1992. 504 pages.
    [Garfield 1972] E. Garfield, "Citation analysis as a tool in journal evaluation," Science, 178(1972), pp.471-479.
    [Glassman 1994] S. Glassman. "A caching relay for the world wide web", Proc. 1st WWW, 1994.
    [Gulub 1989] Golub, G.Van Loan, C.F., "Matrix Computations" , Johns Hopkins University Press, Baltimore, 1989.
    [Grifliths 1986] A. Grifliths, H. C. Luckhurst, and P. Willett. "Using interdocument similarity information in document retrieval systems". Journal of the American Society for Information Science, 37:3-11, 1986.
    [Gudivada 1997] Venkat N. Gudivada, Vijay V. Raghavan, Rajesh Kasanagottu. "Information retrieval on the world wide web", IEEE Internet Computing, Vol 1, No. 5, pp 58-68, 1997.
    [G1998030413] 《国家重点基础研究发展规划》项目“基于Internet超大规模知识检索算法与应用”课题任务书。
    [Harary 1975] F. Harary. "Graph Theory", Addison Wesley, 1975.
    [Haveliwala] Taher H. Haveliwala, Aristides Gionis, Dan Klein, Piotr Indyk "Evaluating Strategies for Similarity Search on the Web" WWW2002, May 7-11, 2002, Honolulu, Hawaii, USA. ACM P432-442
    [Hawking 1998] D. Hawking, N. Craswell, and P. Thistlewaste. "Overview of TREC-7 very large collection track". In Proc. of the Seventh Text Retrieval Conference(TREC-7), pages 91-104, Gaithersburg, Maryland, 1998. National Institute of Standards and Technology.
    [Hawking(1)1999] D.Hawking, N.Craswell, and P.Thistlewaite. "Overview of the TREC-7 very large collection track". In E.M.Voorhees and D.K.Harman,editors,Procecdings of TREC-7 page91-104.NIST Special Publication 500-242,July 1999
    
    [Hawking(2) 1999] D.Hawking, N.Craswell, P.Thistlewaite, and D.Harman. "Results and challenges in web search evaluation" . In proceedings of WWW-8 conference page 243-252,May,1999
    [Hawking 2001] D. Hawking, N. Craswell. "Overview of the TREC-2001 Web Track". In Proc, of the tenth Text Retrieval Conference(TREC-10),Page61-67 ,2001
    [Henzinger 2001] M. Henzinger, "Hyperlink analysis for the Web," IEEE Internet Computing, vol. 1, pp.45-50,January/February 2001.
    [Hou 2000] Jingyu Hou, Yanchun Zhang, Jinli Cao, Wei Lai. "Visual Support for Text Information Retrieval Based on Matrix's Singular Value Decomposition". Proc. of the 1~(st) International Conference on Web Information Systems Engineering (WISE'00), Hong Kong, China, Vol.1 (MainProgram): 333-340.
    [Hou 2002] Jingyu Hou, Yanchun Zhang "Constructing Good Quality Web Page Communities" the Twenty-Fifth Australasian Computer Science Conference (ACSC2002), Vol. 4.
    [Huang 1999] L.Huang "A survey on web information retrieval technology" In Technical Reports from Experimental Computer Systems Laboratory at State University of New York at Stony Brook A survey on the web information retrieval technologies 1999
    [Huberman1998] B. Huberman, P. Pirolli, J. Pitkow, and R. Lukose. "Strong regularities in World Wide Web surfing", Science, 280:95-97, 1998.
    [Indyk 1998] P.lndyk, S.Chakrabarti and B.Dom. "Enhanced hypertext categorization using hyperlinks." In ACM S1GMOD 1998
    [Jaynes 1968] E.T. Jaynes. "Prior Probabilities" . IEEE Transactions on Systems Science and Cybernetics,volume SSC-4, pages 227-241.
    [Jansen 1998] M. Jansen, A. Spink, J. Bateman, and T. Saracevic. "Real life information retrieval: A study of user queries on the web". ACM S1GIR Forum, 32(1):5-17, 1998.
    
    [Kleinberg 1998] J. M. Kleinberg. "Authoritative sources in a hyperlinked environment". In Proc. of the Ninth Annual ACM-SIAM Symposium on Discrete Algorithms, pages 668-677, San Francisco, California, 1998.
    [Kumar(1) 1999] R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. "Trawling the Web for cyber communities" , Proc. 8th WWW , Apr 1999.
    [Kumar (2) 1999] R. Kumar, P. Raghavan, S. Rajagopalan, and A. Tomkins. "Extracting large scale knowledge bases from the Web" , Proc. VLDB, Jul 1999.
    [Kwok 1990] K.L. Kwok. "Experiments with a Component Theory of Probabilistic Information Retrieval Based on Single Terms as Document Components". ACM Transactions on Information Systems, October, volume 8,number 4, pages 363-386.
    [Knuth(1)] Knuth. "The art of computer programming: Fundomental Algorithm".
    
    [Knuth(2)] Knuth. "The art of computer programming: sorting and searching".
    
    [Lawrence 1999] S. Lawrence and C. Lee Giles. "Accessibility of information on the web". Nature, 400:107-109, July 1999.
    [Lempel 2001] R. Lempel and S. Moran, "Salsa: The stochastic approach for link-structure analysis," ACM Transactions on Information Systems, vol. 19, pp. 131-160, April 2001.
    [Liu 2000] Jianguo Liu, Ming Lei, Jianyong Wang, and Baojue Chen. "Digging for gold on the Web: experience with the WebGather" . Proc of the HPC-Asia 2000, pp 751-755, Beijing, April 2000.
    [Liu Yue 2002(1)] Liu Yue, Wang Bin, Guojie Li "Adding The Webpage Quality Factors Into The hyperlinks" IEEE The First International Conference on Machine Learning and Cybernetics Proceedings VOL. 2. 2002. 11
    [Liu Yue2003(1)] Liu Yue, Zhang Gang "An Investigation of the TREC Web Track Datasets Based on the Hyperlink Analysis Algorithm" IEEE The Second International Conference on Machine Learning and Cybernetics Proceedings(ICMLC2003), 2003.11.
    [Liu Yue 2002(2)] Liu Yue, Yang Zhifeng, Cheng Xueqi, Wang Bin "Applying Web Structure Analysis in Web Track in TREC10" International Conference on Intelligent Information Technology proceedings 2002. 9.
    [Liu Yue 2002(3)] Liu Yue, Feng Guozhen "The Research on the Ideal Web IR Service Model" The Proceedings of The 8th Joint International Computer Conference. 2002. 11
    [Lukose 1998] R. M. Lukose and B. Huberman. "Surfing as a real option" , Proc. 1st International Conference on Information and Computation Economies, 1998.
    [Martindale 1996] C. Martindale and A K Konopka. "Oligonucleotide frequencies in DNA follow a Yule distribution" , Computer & Chemistry, 20(1 ):35-38, 1996.
    [Mendelzon 1995] A. Mendelzon and P. Wood. "Finding regular simple paths in graph databases" , SIAM J. Comp. 24(6): 1235-1258, 1995.
    [Mendelzon 1997] A. Mendelzon, G. Mihaila, and T. Milo. "Querying the World Wide Web" , Journal of Digital Libraries 1(1), pp. 68-88, 1997.
    [Min Zhang 2002] Min Zhang, Ruihua Song, Chuan Lin, Liang Ma,Zhe Jiang, Yijiang Jin, Yiqun Liu, Le Zhao, Shaoping Ma "THU at TREC2002: Novelty, Web and Filtering" TREC 2002
    [Michelangelo 2002] Michelangelo Diligenti, Marco Gori , Marco Maggini "Web Page Scoring Systems for Horizontal and Vertical Search" WWW2002, P508—P516
    [Page 1998] Lawrence Page, Sergey Brin, Rajeev Motwani, Terry Winograd. "The PageRank Citation Ranking: Bringing Order to the Web". Manuscript in progress. http://google.stanford.edu/~backrub/pageranksub.ps
    
    [Pareto 1897] V Pareto. "Cours d' economie politique" , Rouge, Lausanne et Paris, 1897.
    
    [Pearl 1988] Judea Pearl. "Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference" . Morgan Kanfmann, 1988. 552 pages.
    |Pinski 1976] G. Pinski, F. Narin, "Citation influence for journal aggregates of scientific publications:Theory, with application to the literature of physics," Inf. Proc, and Management, 12(1976), pp. 297 - 312.
    [Pitkow 19981 Pitkow J.E, "Summary of WWW Characterizations" . In 7~(th) WWW Conference. Bribane, Australia, 1998.
    [Pirolli 1996] P. Pirolli, J. Pitkow, and R. Rao. 'Silk from a sow's ear: Extracting usable structures from the Web" , Proc. ACM SIGCHI, 1996.
    [Pitkow 1997] J. Pitkow and P. Pirolli. "Life, death, and lawfulness on the electronic frontier" , Proc. ACM SIGCHI, 1997.
    
    [Rajashckar 1995] T.B. Rajashekar and W.B.Croft. "Combining Automatic and Manual Index Representations" . JASIS, volume 46, number 4, pages 272-283.
    [Ramesh2000] Ramesh R.Sarukkai "link prediction and path analysis using Markov chains." In proc.The International World Wide Web Conference 2000.
    [Raymond 2000] Raymond Kosala and hendrik Blockeel. "Web Mining Research:A Survey" . ACM SIGKDD, July 2000
    [Ribeiro 1995] Berthier A.N. Ribeiro. "Approximate Answers in Intelligent Systems". Ph.D. Thesis, University of California, Los Angeles.
    [Ricardo 1999] Ricardo Baeza-Yates, Berthier Ribeiro-Neto "Modern Information Retrieval" ACM Press 1999.
    [Ribeiro-Neto] B.Ribeiro-Neto, I. Silva, and R. Muntz. "Bayesian network models for IR". In: Soft Computing in Information Retrieval: Techniques and Applications, F.Crestani and G. Pasi editors, Springer Verlag. To appear.
    [Ribeiro-Neto 1996] Ribeiro-Neto and R. Muntz. "A belief network model for IR". In Proc, of the 19~(th) ACM S1GIR Conference on Research and Development Development in Information Retrieval, pages 53-260, Zurich, Switzerland, 1996.
    [Robert 1997] Robert Korfhage. "Information storage and retrieval". John Wiley and Sons, Inc, 1997
    [Robertson 1977] S. E. Robertson, "The probability ranking principle in IR", Journal of Documentation, 1977, Vol 33, P294-304.
    [Rocchio 1971] Rocchio.J, "Relevance Feedback in Information Retrieval". In The SMART Retrieval system, Prentice-Hall, Englewood NJ. 1971, 232-241.
    [Rocchio 1971] J.Rocchio, "Relevance Feedback in Information Retrieval". In The SMART Retrieval system, Prentice-Hall, Englewood NJ. 1971, 232-241.
    [Salton 1968] Gerard Salton. "Automatic Information Organization and Retrieval". McGraw-Hill. New York. NY. 1968.
    [Salton 1975] Gerald Salton, A. Wang, C. S. Yang, "A vector space model for automatic indexing" , Communications of the ACM 1975, Vol 18.
    [Salton 1983] G. Salton and M. McGill. "Introduction to Modern nformation Retrieval". McGraw-Hill, New York, NY 1983. 448 pages.
    
    [Salton 1988] G. SaJton & C. Buckley (1988)."Term-weighting Approaches in Automatic Retrieval" . Information Processing & Management, volume 24, number5, pages 513-523.
    [Shaw 1991] W.M. Shaw, J.B. Wood, R.E. Wood AND H.R. Tibbo. "The Cystic Fibrosis Database: Content and Research Opportunities". LISR, volume 13, pages347-366.
    [Silva 1999| A. Silva, E. Veloso, P. Golgher, B. Pdbeiro Neto, A. Laender, and N. Ziviani. "Cobweb: a crawler for the brazilian web" . In Proc of the String Processing and Information Retrieval (SPlRE'gg), pages 184-191, Cancum, Mexico, 1999.
    [Silva 2000] A.Silva, B.Ribeiro-Neto, P.Calado, N.Ziviani and E.Moura. "Link-based and content-based evidential information in a belief network model" . In Proceedings of the 23~(rd) International ACM SIG1R Conference on research and development in Information Retrieval, pages 96-103,2000.
    [Simon 1955] H.A. Simon. On a class of stew distribution functions, Biometrika, 42:425-440, 1955.
    [Small 1973] H. Small. "Cocitation in the scientific literature: a new measure of the relationship between two documents" . Journal of the American Society for Information Science, 24: 265-269, 1973.
    [Small 1977] H. G. Small and M. E. D. Koenig. "Journal clustering using a bibliographic coupling method" . Information Processing Management, 13: 277-288, 1977.
    [Soumen 1999] Soumen Chakrabarti, Martin van den Berg, Byron Dom, "Focused Crawing:A New Approach to Topic-Specific Web Resource Discovery". Proceeding of the 8th World Wide Web Conference, May 1999,Toronto,Canada
    [Spertus 1997] Ellen Spertus. "ParaSite: Mining Structural Information on the Web". In Proceedings of the Sixth International World Wide Web Conference, April 1997.
    [STOTTS 1989] P. David Stotts and Richard Furuta "Petri-Net-Based Hypertext: Document Structure with Browsing Semantics" ACM Transactions on Information Systems, Vol.7, No. 1, January 1989, Pages 3-29.
    [Tahcr 1999] Taher H. Haveliwala "Efficient Computation of PageRank", Stanford University, Technical Report, October 18,1999
    
    [Taher 2002] Taher H. Haveliwala "Topic Sensitive PageRank" WWW2002, May 7-11, 2002, Honolulu, Hawaii, USA. ACM P517-526
    [TREC2001] TREC-2001 Web Track Guidelines
    [TREC-2002] TREC-2002 Web Track Guidelines
    [Turtle 1991] H. Turtle & W. B. Croft (1991). "Evaluation of an Inference Network-Based Retrieval Model". ACM Transactions on Information Systems, volume9, number 3, pages 187-222.
    [Voorhees, 1999] Voorhees E., D. Harman. "Overview of the Eighth Text REtrieval Conference" (TREC-8). In The Eighth Text R Etrieval Conference (TREC-8), 1999.
    [Voorhees, 2000] Voorhees E., D. Harman. "Overview of the Ninth Text REtrieval Conference" (TREC-9). In The Ninth Text REtrieval Conference (TREC-9), 2000.
    [Voorhees, 2001] Voorhees E.M. "Overview of TREC 2001. In The Tenth Text REtrieval Conference" (TREC-10), 2001
    [Voorhees, 2002] Voorhees E.M. "Overview of TREC 2002. In The Eleventh Text REtrieval Conference" (TREC-11), 2002.
    [Wang 2001] B. Wang, H. Xu, Z. Yang, Y. Liu, X. Cheng, D. Bu, S. Bai, "TREC-10 Experiments at CAS-ICT: Filtering, Web and QA" , In The Tenth Text REtrieval Conference (TREC 10), page 109, 2001
    [Weiss 1996] R. Weiss, B. Velez, M. Sheldon, C. Nemprempre, P. Szilagyi, D.K. Gifford, "HyPursuit:A Hierarchical Network Search Engine that Exploits Content-Link Hypertext Clustering" , Proceedings of the Seventh ACM Conference on Hypertext, 1996.
    [White 1989] H.D. White and K.W. McCain, "Bibliometrics" , Annual Review of Information Science and Technology, Elsevier, 1989, pp. 119-186
    [Wong 1991] S.Wong& Y. Yao, (1991). "A Probabilistic Inference Model for Information Retrieval". Information Systems, volume 16, pages 301-321.
    [Wong 1995] S. Wong & Y. Yao, (1995). "On Modeling Information Retrieval with Probabilistic Inference " .ACM Transactions on Information Systems, volumel3, number 1, pages 39-68.
    [Xu 2002] H.Xu, Z. Yang, B. Wang, J. Liu, J. Chen, Y. Liu, Z. Yang, H.Zhang, X. Cheng, S. Bai, "TREC-11 Experiments at CAS-ICT: Filtering, and Web in The 11th Text REtrieval Conference" (TREC 11), 2002
    [Yang 2002] Zhifeng Yang, Yue Liu, Sujian Li."Applying Information Retrieval Technology to Incremental Knowledge Management." Proceeding of Engineering and Deployment of Cooperative Information Systems, First International Conference. Springer, September 2002.
    [Yule 1944] G.U. Yule. "Statistical Study of Literary Vocabulary", Cambridge University Press, 1944.
    [Yuwono 1996] B.Yuwono and D.L.Lee "Search and ranking algorithms for locating resources on World Wide Web." proc. of the 12th International Conference on Data Engineering(ICDE),pages 164-171, New Orleans,USA, 1996
    [Zhang 2000] D.Zhang and Y.S.Dong "An efficient algorithm to rank web resources" In proc.The International World Wide Web Conference 2000.
    [Zhuang 2002] Long Zhuang, Li Yi Shang, Wei Zhang "Improvement of HITS-based AIgorithms on Web Documents" WWW2002, May 7-11, 2002, Honolulu, Hawaii, USA. ACM P527-535
    [Zhuge 2002] H.Zhuge," VEGA-KG:A Way to the Knoledge Web",11th International world wide web conference,Hawaii,USA, May 2002
    [Zhuge 2002] H.Zhuge "A Knowledge Grid Model and Platform for Global Knowledge Sharing" ,Expert System with Applications,22(4)(2002)313-320
    [Zipf 1949] G.K. Zipf. "Human Behavior and the Principle of Least Effort", Addison-Wesley, 1949.
    [卜东波 2000] 中国科学院计算技术研究所博士学位研究生学位论文“聚类/分类理论研究及其在文本挖掘中的应用”卜东波 2000年10月
    [宫秀军 2002] 中国科学院计算技术研究所博士学位研究生学位论文“贝叶斯学习理论及其应用研究”宫秀军 2002年6月
    [冯国臻 2001] 中国科学院计算技术研究所博士学位研究生学位论文“基于结构分析的大规模WWW文本信息检索技术的研究”冯国臻2001年6月
    [黄萱菁 1998] “大规模中文文本的检索,分类与摘要研究”复旦大学学位论文
    [李晓黎 2001] 中国科学院计算技术研究所博士学位研究生学位论文“WEB信息检索与分类中的数据采掘研究”李晓黎 2001年6月
    [刘悦 2002(1)] 刘悦,杨志峰,程学旗,王斌“利用链接分析技术提高搜索引擎查找质量的研究” 《微电子学与计算机》2002、5
    [刘悦 2002(2)] 刘悦,程学旗,李国杰 “提高PageRank算法效率的方法初探” 《计算机科学》2002、6
    [刘悦 2002(3)] 刘悦,王斌,杨志峰,张鑫 “结构化信息在主题信息提取中的应用研究” 中科院计算所第一届领域前沿青年基金学术研讨会论文集 2002、11
    [刘悦 2003(1)] 刘悦,王斌,杨志峰,张鑫“Web关键资源发现中的链接分析技术” 全国第七届计算语言学联合学术会议(JSCL-2003) 2003、8
    [刘悦 2003(2)] 刘悦,冯国臻,程学旗,薄立彦 “理想的Web IR服务模式的研究” 《计算机科学》,2003第30卷第5期
    [余志华 1999] 中国科学院计算技术研究所硕士学位研究生学位论文“WWW站点的分析与分类”余志华 1999.5
    [杨志峰 2000] 中国科学院计算技术研究所硕转博报告“搜索引擎技术、结构与发展” 杨志峰 2000
    [杨志峰 2003] 杨志峰,刘悦,杨哲 ”TREC2002中的WEB信息检索.” 《计算机工程与应用》2003第39卷第26期
    [天罗搜索引擎技术总结报告] 知识挖掘组技术报告,2000
    [史忠植 1998] 史忠植,“高级人工智能”,科学出版社,1998.
    [史荣昌 2001] 史荣昌 “矩阵分析”北京理工大学出版社 2001
    [陆汝钤 1996] 陆汝钤,“人工智能”,科学出版社,1996。
    [石纯一 1993] 石纯一,黄昌宁等,“人工智能原理”,清华大学出版社,1993。
    [蒋尔维 1984] 蒋尔维 “对称矩阵计算”上海科学技术出版社 1984
    [蒋正新 1988] 蒋正新,施国梁“矩阵理论及其应用”北京航空学院出版社 1988
    [王耕禄 1988] 王耕禄,史荣昌“矩阵理论”国防工业出版社 1988
    [郑绍濂 1978] 郑绍濂,吴立德,陶宗英等“概率论与数理统计” 上海科学技术出版 1978
    [徐树方 2001] 徐树方“矩阵计算的理论与方法”北京大学出版社 2001
    [左垲 1988] 左垲 主译“图、网络与算法”高等教育出版社 1988
    [严蔚敏等编(1) 1997] “数据结构” 清华大学出版社 1997
    [严蔚敏等编(2) 1997] “数据结构题集” 清华大学出版社 1997
    [AskJeeves] www.ask.com
    [Baidu] www.baidu.com
    [Google] www.google.com
    [Inktomi] www.inktomi.com
    [Search Engine Watch] www.searchenginewatch.com
    [WebGather] pccs.pku.edu.cn:8000/
    [yahool] www.vahoo.com

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700