维数约减算法研究及其在大规模文本数据挖掘中的应用

英文题名：Research of Dimensionality Reduction and Its Appliacation on Data Mining of Large-Scale Text
作者：于瑞国
论文级别：博士
学科专业名称：计算机应用技术
中文关键词：维数约减 ; 数据挖掘 ; 随机映像 ; 等距嵌入 ; 反馈式搜索引擎 ; 点击数据
英文关键词：Dimensionality Reduction ; Data Mining ; Random Projection ; Isometric Embedding ; Feedback Search Engine ; Clickthrough Data
学位年度：2008
导师：何丕廉
学科代码：081203
学位授予单位：天津大学
论文提交日期：2008-05-01

摘要

随着网络的快速发展,人们处在这个“信息爆炸”的时代,常常面对海量数据分析和处理的任务,且这样的数据仍在以几何级数增长。同时,在现实中这些海量数据往往又是高维而稀疏的,且存在着大量的冗余。因而能对高维海量数据做压缩处理,且保持其内在属性的有效处理方法成为人工智能、机器学习、数据挖掘等领域的重要研究课题之一。高效的维数约减算法是对高维海量数据处理的一种有效方法,且具有一定的实际应用价值。本文的关注点集中在适用于高维海量数据的快速维数约减算法的研究及其具体应用。
     本文分别提出了两种新的维数约减算法:(1)基于期望扰动的直接随机映像算法(On the Expected Distortion Bound of Direct Random Projection,简称DRP);(2)基于锚点集的最小平方误差等距嵌入算法(Anchor points based Isometric Embedding under least square error criterion,简称AIE)。
     基于期望扰动的直接随机映像算法DRP具有O ( dn )的时间复杂性,这样的性能评价是建立在对期望扰动分析的基础上的。并证明了1)DRP算法的期望扰动的界。2)在适当的给定条件下,可在O (1)的随机时间内找到一个将期望扰动限定在一个合适范围之内的DRP映像。进而提出了一种获得中肯DRP的启发式算法。此算法具有稳固的渐进加速比,相对于其他随机映像算法具有更好的稳定性。而且在流数据模式下,可采用增量策略,DRP算法的时间复杂性为O ( d log d )。基于锚点集的最小平方误差等距嵌入算法AIE具有O ( n log( n ))的时间复杂性,而且在获得测地线距离后的计算时间复杂度达到对嵌入点数的线性关系,且可以完全并行实现。与Isomap、LLE等非线性维数约减算法相比较,具有更优化的时间复杂性。
     当前主流的搜索引擎根据查询词在网页中的出现频率,辅以网页权威性等信息,生成查询结果。但用户提供的查询词往往非常简单,在许多情况下,搜索引擎难以确定用户的查询意图。本文提出了一种利用Web日志中的海量点击数据进行网页内容相关性挖掘的方法,在此基础上给出了一种反馈式搜索引擎(Feedback Search Engine ,简称FSE)框架及相关算法。FSE根据网页相关性动态生成查询结果,以期提供给用户更中肯和个性化的信息。
With the rapid development of Internet, people often need to face massive data to analysis and process in the age of“information explosion”, and this large amount of data is still increasing in a geometrical rate. In real world, the massive data always is high dimensional and sparse, and redundancy often exists in the massive data. Compressing on massive data and keeping the internal properties becomes one of the important research topics in artificial intelligence、machine learning、data mining and other fields. High Efficient dimension reduction algorithm is a method of processing high dimensional mass data and has certain practical application value. This paper focuses on research and application of the rapid dimension reduction algorithm, which is applicable to the massive data.
     The paper proposes two new dimension reduction algorithms: The First is On the Expected Distortion Bound of Direct Random Projection (DRP). The second is Anchor points based Isometric Embedding under least square error criterion (AIE). On the Expected Distortion Bound of Direct Random Projection (DRP) has a time complexity of O ( dn ). The performance of DRP is investigated in terms of expected distortion analysis. We prove: 1) an expected distortion bound of DRP; and 2) given moderate conditions, the DRP with appropriate expected distortion can be found in O (1) random time. Furthermore, we propose a simple heuristic to facilitate finding an appropriate DRP. By experiments, DRP might be more stable than the other two random projection algorithms. Using an incremental strategy, the total time cost of DRP is O ( d log d ) in flow data mode.
     Anchor points based Isometric Embedding under least square error criterion (AIE) has a time complexity of O ( n log( n )), and after obtained geodesic distances it has linear time complexity for embedded points and can be fully realized in parallel. Compared with Isomap, LLE etc. nonlinear dimension reduction algorithms, AIE have better time complexity.
     Current mainstream search engines generate search results by analyzing statistical information such as the frequency of queries in web pages and the ranking of web pages. In many situations, search engines can not determine what kind of information users want. This paper describes a web content relevance mining method using large amounts of clickthrough data in web log. Furthermore, based on this method, we present a framework of Feedback Search Engine (FSE) and associated algorithms. According to page-to-page relevance, FSE generate search results dynamically and provide its users more accurate and personalized information.

引文

[1]周志华,机器学习与数据挖掘, http://cs.nju.edu.cn/zhouzh/zhouzh.files/publication/cccf07.pdf
    [2] U. Fayyad, G. Piatetsky-Shapiro, R. Smyth. Knowledge discovery and data mining: Towards a unifying framework. In: Proc. KDD’96, Portland, OR, 82~88
    [3]徐从富,李石坚,王金龙,机器学习研究与应用新进展, http://www.cs.zju.edu.cn/people/xucf/course/other/机器学习研究与应用新进展(修改稿,2006-10-16).pdf
    [4] R.S.迈克尔斯基, J.G.卡伯内尔, T.M.米切尔,机器学习,科学出版社,1992
    [5] H. Sebastian Seung, Daniel D. Lee, Cognition: the manifold way of perception, Science, 2000, 290(5000):2268~2269
    [6] Joshua B. Tenenbaum, Vin De Silva, John C. Langford, A global geometric framework for nonlinear dimensionality reduction, Science, 2000, 290(5000): 2219~2323
    [7] Sam T. Roweis, Lawrence K. Saul, Nonlinear dimensionality reduction by locally linear embedding, science, 2000,290(5000): 2323~2326
    [8] James M.Lattin,J. Douglas Carrol,Paul E.Green,多元数据分析(英文版),机械工业出版社,北京,2003年7月,91~101,211~21
    [9]方驰、丁晓青、吴佑寿,基于PCA的脱机手写汉字的统计模型及其应用,模式识别与人工智能,2001,14(1)
    [10] N.Vlassis Y.Motomura and B. Krose, Supervised dimension reduction of intrinsically low-dimensional data. Neural Computation, 14(1):191~215, January 2002
    [11] J. A. Lee, A.Lendasse and M. Verleysen, Curvilinear Distance Analysis Versus Isomap, ESANN 2002 proceedings-European Symposium on Artificial Neural Networks, Bruges (Belgium), 24~26 April 2002, pp.185—192
    [12] L. K. Saul, S. T. Roweis, Think globally, fit locally: unsupervised learning of low dimensional manifolds, Journal of Machine Learning Research, 2003, 4:119~155
    [13] Zhenyue Zhang and Hongyuan Zhu, Local Linear Smoothing for Nonlinear Manifold Learning, CSE-03-003, Technical Report, CSE, Penn State Univ., 2003.
    [14] de Ritter D, Kouropteva O, Okun O, Pietikdinen M and Duin RPW, Supervised locally linear embedding, Artificial Neural Networks and Neural Information Processing, ICANN/ICONIP 2003 Proceedings, Lecture Notes in Computer Science 2714, Springer, 333~341
    [15] V. de Silva, J. B. Tenenbaum, Global versus local methods in nonlinear dimensionality reduction. (2002), Advances in Neural Information Processing Systems 15. M.SBecker S., Thrun, S., and Obemayer, K. (eds). Cambridge, MIT Press, 2002, 705~712
    [16]史忠植,知识发现,北京:清华大学出版社,2002
    [17]韩家炜、孟小峰、王静等,Web挖掘研究,计算机研究与发展,2001,38(4):405~413
    [18] Zaiane, O.R., Xin, M., Han, J. Discovering Web access patterns and trends by applying OLAP and DATA mining technology on Web logs. In: Howe, S.E., Smith, T.R., eds. Proceedings of the IEEE International Forum on Research and Technology Advances in Digital Libraries. Los Alamitos, CA: IEEE CS Press, 1998. 19~29.
    [19] Paliouras, G., Papatheodorou, C., Karkaletsis, V., et al. Clustering the users of large Web sites into communities. In: Danyluk, A., ed. Proceedings of the 17th International Conference on Machine Learning. San Francisco: Morgan Kaufmann Publishers, 2000. 719~726.
    [20] Nanopoulos, A., Manolopoulos, Y. Mining patterns from graph traversals. Data and Knowledge Engineering, 2001,37(3):243~266.
    [21] T. M. Mitchell, Machine Learning, New York: McGraw-Hill, 1997
    [22] U. Fayyad, G. Piatetsky-Shapiro, R. Smyth. Knowledge discovery and data mining: Towards a unifying framework. In: Proc. KDD’96, Portland, OR, 82~88
    [23]高济,朱森良,何钦铭,人工智能基础,北京:高等教育出版社,2003,219-220
    [24]杨善林,倪志伟,机器学习与智能决策系统,北京:科学出版社,2004
    [25]蔡瑞英,李长河,人工智能,武汉:武汉理工大学出版社,2003
    [26] Yao Y, Lin T., Generalization of rough sets using model logics, Intelligent automation and soft computing, 1996
    [27] Sutton R., Barto A., Reinforcement learning: an introduction, Cambridge, MA: MIT Press, 1998
    [28] Pawlak Z., Rough set-theretial aspects of reasoning about data, Boston, MA, Kluwer Academic Publishers, 1991
    [29] Freund Y., Schapire R., A short introduction of boosting, Journal of Japanese Society for Artificial Intelligence, 1999, 14(5):771~780
    [30] Kearns M., Vazirani U., An introduction to computational learning theory, Cambridge, MA: MIT Press, 1994
    [31] Carbonell J., Introduction: paradigms for machine learning, Artificial Intelligence, 1989, 40(1):1~9
    [32] Dietterich T., Machine learning research: four current directions(final draft), AI Magazine, 1997, 18(4):97~136
    [33]戴维迪,非监督知识发现过程中若干关键问题研究:[博士学位论文],天津:天津大学,2006
    [34] R. S. Michalski, J. G. Carbonell and T. M. Mitchell, Machine learning: an artificial intelligence approach, San Mateo, CA:Morgan Kaufmann, 1986
    [35] T. G. Dietterich and R. S. Michalski, A comparative review of selected methods for learning from examples in Michalski et al, Machine Learning: An Artificial Intelligence Approach, San Mateo, CA: Morgan Kaufmann, 1983, 1: 41~82
    [36] Kolodner J. L., Improving human decision making through case-based decision aiding, AI Magazine, 1991, 12(2): 52~68
    [37] Holland J. H., Adaptation in natural and artificial systems, University of Michigan Press, 1975
    [38] Goldberg D. E., Genetic algorithms in search, optimization, and machine learning, Addison-Wesley, 1989
    [39]袁曾任,人工神经网络及其应用,北京:清华大学出版社,1999
    [40]孙克宽,郭驼英,梁肇军,拓扑学,武汉:华中师范大学出版社,2000
    [41]陈省身,陈维桓,微分几何讲义,北京:北京大学出版社,1999
    [42]张筑生,微分拓扑讲义,北京:北京大学出版社,2002
    [43]白正国,沈一兵,水乃翔等,黎曼几何初步,北京:高等教育出版社,2004
    [44]周志华,王钰,机器学习及其应用2007(张军平,何力,监督流形学习),北京:清华大学出版社,2007
    [45] Whye Y, Roweis S., Automatic alignment of local representations. In: Advances in Neural Information Processing System. MIT Press, 2003, 15: 841-848
    [46] Zhang Z, Zha H, Principal manifolds and nonlinear dimension reduction via local tangent space alignment, SIAM Journal on Scientific Computing, 2004, 26(1): 313~338
    [47] Roweis S, Saul L K, Hinton G E, Global coordination of local linear models, In: Diettericht T, Becker S, Ghahramani Z, eds, Advances in Neural Information Processing Systems, MIT Press, 2002, 14: 889-896
    [48] Brand M, Charting a manifold, In: Becker S, Thrun S, Obermayer K, eds, Advances in Neural Information Processing Systems, MIT Press, 2002, 14: 961-968
    [49] Ham J, Lee D D, Saul L K, Semisupervised alignment of manifolds, In: Ghahramani Z, Cowell R, eds, Proceedings of the Annual Conference on Uncertainty in Arifificial Intelligence, 2005, 10: 120~127
    [50] Belkin M, Niyogi P, Laplacian eigenmaps and spectral techniques for embedding and clustering, In: Dietterich T G, Ghahramani Z, eds, Advances in Neural Information Processing Systems, MIT Press, 2001, 585~591
    [51] Weinberger K Q, Sha F, Saul L K, Learning a kernel matrix for nolinear dimensionality reduction, In: Proceedings of the 21st International Conference on Machine Learning(ICML-04), Banff, Canada, 2004
    [52] Sha F, Saul L K, Analysis and extension of spectral methods for nonlinear dimensionality reduction, In: Proceeding of the 22nd International Conference on Machine Learning(ICML-05), 2005, 785~792
    [53] Hinton G E, Roweis S, Stochastic neighbor embedding, In: Advances in Neural Information Processing System, Cambridge: MIT Press, 2003, 833~840
    [54] Nam K, Je H, Choi S, Fast stochastic neighbor embedding: A trust-region algorithm, In: Proceedings of 2004 IEEE International Joint Conference on Neural Networks, 2004, 1: 123~128
    [55] Donoho D L, Grimes C, Hessian eigenmaps: New locally linear embedding techniques for highdimensional data. Tech. Rep., TR2003-08, Department of Statistics, Stanford University, 2003
    [56]赵连伟,高维数据的低维流形结构研究:[博士学位论文],北京:北京交通大学,2006
    [57] Andrew Y. Ng, Michael I. Jordan, Yair Weiss, On spectral Clusterning: analysis and an algorithm, NIPS 14, 2001
    [58] Shawe-Taylor J., Williams C, The stability of kernel principal components analysis and its relation to the process eigenspectrum, NIPS 15, 2002
    [59] Chung F. R. K, Spectral gragh theory, American Mathematical Society, 1997
    [60] J. H. Friedman, A recursive partitioning decision rule for nonparametric classifiers, IEEE Trans. On Comp.: 1977, 404~408
    [61] L. Breiman, J. Friedman, R. Olshen, etc., Classification and regression trees, Monterey, CA, Wadsworth International Group, 1984
    [62] J. R. Quinlan, Unknown attribute values in induction, In proc. 6th Int. Workshop on machine learning, Ithaca, NY: 1989, 164~168
    [63] R. L Kennedy, Y. Lee, B. Van Roy, etc., Solving data mining problems through pattern recognition, Prentice Hall, Upper Saddle River, NJ, 1998
    [64] D. Pyle, Data preparation for data mining, Morgan Kaufmann: San Francisco, 1999
    [65] K. Ross and D. Srivastava, Fast computation of sparse datacubes, In proc. 1997 Int. Conference Very Large Data Bases(VLDB’97), Athens, Greece: 1997, 116~125
    [66] W. Cleveland, Visualizing data, Hobart Press: Summit, NJ, 1993
    [67] J. L. Devore, Probability and statistics for engineering and the sciences, New York: Kuxbury Press, 4th edition, 1995
    [68] J. Gray, S. Chaudhuri, A. Bosworth, etc., Data cube: A relational aggregation operator generalizing group-by, cross-lab and sub-totals, Data Mining and Knowledge Discovery, 1997, 1: 29~54
    [69] Y. Cai, N. Cercone, and J. Han, Attibute-oriented induction in relational databases, In G. Paitetsky-Shapiro and W. J. Frawley, Knowledge Discovery in Databases, Cambridge, MA: AAAI/MIT Press, 1991, 213~228
    [70] E. R. Tufte, Envisioning Information, Graphics Press, 1990
    [71] E. R. Tufte, Visual Explanations: images and quantities, Evidence and Narratives, Graphics Press, 1997
    [72] D. A. Keim, Visual techniques for exploring databases, In tutorial notes, 3rd Int. Conf. Knowledge Discovery and Data Mining(KDD’97), Newport Beach, CA, 1997
    [73] E. F. Codd, S. B. Codd, etc., Beyond decision support, Computer World, 1993
    [74] S. Chaudhuri and U. Dayal., An overview of data warehousing and OLAP technology, ACM SIGMOD Record, 1997, 26: 65~74
    [75] J. Han and Y. Fu, Exploration of the power of attribute-oriented induction in data mining, In U. M. Fayyad, G. Piatetsky-shapiro, P. Smyth, etc., Advance in Knowledge Discovery and Data Mining, Cambridge, MA: AAAI/MIT Press, 1996, 399~421
    [76] J. Han, S. Nishio, H. Kawano, etc., Generalization-based data mining in object-oriented databases using an object-cube model, Data and Knowledge Engineering, 1998, 25: 55~97
    [77] R. Agrawal, T. Imielinski, and A. Swami, Mining association rules between sets of items in large databases, In Proc. 1993 ACM-SIGMOD Int. Conf. Management of Data(SIGMOD’93), Washington, DC: 1993, 207~216
    [78] R. Agrawal and R. Stikant, Fast algorithms for mining association rules, In proc. 1994 Int. Conf. Very Large Data Bases(VLDB’94), Santiago, Chile: 1994, 487~499
    [79] B. Ozden, S. Ramaswamy, and A. Silberschatz, Cyclic association rules, In Proc. 1998 Int. Conf. Data Engineering(ICDE’98), Orlando, FL: 1998, 412~421
    [80] R. Srikant and R. Agrawal, Mining generalized association rules, In Proc.1995 Int. Conf. Very Large Data Bases(VLDB’95), Zurich, Switzerland: 1995, 407~419
    [81] S. M. Weiss and C. A. Kulikowski, Computer systems that learn: Classification and prediction methods form statistics, neural nets, machine learning, and expert systems, San Mateo, CA: Morgan Kaufmann, 1991
    [82] S. K. Murthy, Automatic construction of decision trees form data: A multidisciplinary survery, Data Mining and Knowledge Discovery, 1998, 2: 245~389
    [83] A. K. Jain, M. N. Murty and P. J. Flynn, Data clustering: A survey, ACM Comput. Surv., 1999, 31: 264~323
    [84] L. Kaufman and P. J. Rousseeuw, Finding groups in data: an introduction to cluster analysis, New York: John wiley&Sons, 1990
    [85] E. Knorr and R. Ng., Algorithms for mining distance-based outliers in large datasets, In Proc. 1998 Int. Conf. Very Large Data Bases(VLDB’98), New York, 1998, 392~403
    [86] A. Arning, R. Agrawal and P. Raghavan, A linear method for deviation detection in large databases, In Proc. 1996 Int. Conf. Data Mining and Knowledge Discovery(KDD’96), Poland, OR, 1996, 164~169
    [87] H. V. Jagadish, N. Koudas and S. Muthukrishnan, Mining deviants in a time series database, In Proc. 1999 Int. Conf. Very Large Data Bases(VLDB’99), Edinburgh, UK, 1999, 102~113
    [88] M. Ester, H.P. Kriegel, J. Sander, etc., Spatial data mining: A database approach, In Proc. Int. Symp. Large Spatial Databases(SSD’97), Berlin, Germany, 1997, 47~66
    [89] R. Agrawal, C. Faloutsos and A. Swami, Efficient similarity search in sequence databases, In Proc. 4th Int. Conf. Foundations of Data Organization and Algorithms, Chicago, IL, 1993
    [90] J. Han, J. Pei, B. Mortazavi-Asl, etc., Freespan: frequent pattern-projected sequential pattern mining, In Proc. 2000 Int. Conf. Knowledge Discovery and Data Mining(KDD’00), Boston, MA, 2000
    [91] PEARSON K., On lines and planes of closest fit to systems of points in space, Philosophical Magazine, 1901, 6(2): 559—572
    [92] HOTELLING H., Analysis of a complex of statistical variables into principal components J. Educational Psychology, 27(1933), pp. 417~441
    [93] K. Karhunen, Uber lineare methoden in der wahrscheinlichkeitsrechnung. Amer. Acad. Sci., Fennicade, Ser. A, I, 37:3~79, 1947
    [94]于秀林,任雪松,多元统计分析,北京:中国统计出版社,2003
    [95]孙越恒,基于统计的NLP技术在中文信息检索中的应用研究:[博士学位论文],天津:天津大学,2005
    [96] Ella Bingham, Heikki Mannila, Random projection in dimensionality reduction: Applications to image and text data[C], Knowledge Discovery and Data Mining (2001), pp. 245–250
    [97] Achlioptas, D. Database-friendly random projections: Johnson-Lindenstrauss with binary coins, Journal of Comp. & Sys. Sci. 66 (2003), pp. 671–687
    [98] N. Linial, E. London, Y. Rabinovich, The geometry of graphs and some of its algorithmic applications, Combinatorica 15 (2) (1995) 215– 245
    [99] Johnson, W.B., Lindenstrauss, J. Extensions of Lipschitz mappings into a Hilbert space, Contemp. Math. 26 (1984), pp. 189–206
    [100] Frankl P., Maehara H., The Johnson-Lindenstrauss lemma and the sphericity of some graphs[C], Journal of Combinatorial Theory Series A, 44 (1987), 355–362
    [101] Dasgupta S., Gupta A., An elementary proof of the Johnson-Lindenstrauss lemma[C], Technical Report 99~006, UC Berkeley, March 1999
    [102] Nir Ailon, Bernard Chazelle, Approximate Nearest Neighbors and the Fast Johnson-Lindenstrauss Transform, Annual ACM Symposium on Theory of Computing (2006), pp. 557~563
    [103]侯越先,丁峥,何丕廉,基于自组织的鲁棒非线性维数约减算法,计算机研究与发展,2005,42(2):188~195
    [104]侯越先,吴静怡,何丕廉,基于局域主方向重构的适应性非线性维数约减,计算机应用,2006年第4期
    [105] K. V. Mardia, J. T. Kent, J. M. Bibby, Multivariate Analysis, London, Academic Press, 1979
    [106] Zhang M, Ma S, Song R, DF or IDF? On the Use of Primary Feature Model for Web Information Retrieval. . Journal of Software, 2005,16(5):1012~1020 (in Chinese with English abstract)
    [107] P. A. Chirita, D. Olmedilla, and W. Nejdl, PROS A personalized ranking platform for web search. In Int. Conf. on Adaptive Hypermedia and Web-based Syst., 2004
    [108] F. Liu, C. Yu, and W. Meng, Personalized web search by mapping user queries to categories. In Proceedings of the 11th International Conference on Information and Knowledge Management, ACM Press, 2002, 558~565
    [109] Mark E. Crovella, Azer Bestavros, Self-Similarity in World Wide Web Traffic: Evidence and Possible Causes. IEEE/ACM TRANSACTIONS ON NETWORKING, VOL. 5, NO. 6, DECEMBER, 1997, 835~846
    [110] Gui-Rong Xue, Hua-Jun Zeng, Zheng Chen, Wei-Ying Ma, Hong-Jiang Zhang, Chao-Jun Lu, Implicit Link Analysis for Small Web Search. SIGIR, 2003, 56~63
    [111] L. Adamic, The Small World Web. Proceedings of the European Conf. on Digital Libraries, 1999
    [112] R. Albert, H. Jeong, A.L. Barabasi, The Diameter of the World Wide Web. Nature 401, 130, 1999
    [113] Kumar, R., Ragbavan, P., Rajagopalan, S., Tomkins, A , The Web and Social Networks. Volume 35, Issue 11, Nov. 2002, 32~36
    [114] Kleinberg J. M. Authoritative Sources in A Hyperlinked Environment. Journal of the ACM, 1999, 46(5):604~632
    [115] Page L., Brin S., Motwani R. and Winograd T, The PageRank Citation Ranking: Bringing Order to the Web. Technical report, Stanford University Database Group, 1998
    [116] Brin S. and Page L., The Anatomy of A Large-scale Hypertextual Web Search Engine. In Proc. of WWW7, Brisbane, Australia, April 1998, 107~117
    [117] Thorsten Joachims, Optimizing Search Engines using Clickthrough Data. Cornell University Department of Computer Science Ithaca, NY 14853 USA
    [118] Q. Tan, X. Chai, W. Ng, and D.L. Lee, Applying co-training to clickthrough data for search engine adaptation. In Proc. of the 9th DASFAA conference, 2004, 519~532
    [119] Wilfred NG, Lin DENG and Dik-Lun LEE, Spying Out Real User Preferences in Web Searching. ACM Transactions on Internet Technology, 2006
    [120] Wittgenstein, Ludwig, Philosophical Investigations. Blackwell Publishing
    [121] Thorsten Joachims, Laura Granka, Bing Pan, Helene Hembrooke, Geri Gay. Accurately Interpreting Clickthrough Data As Implicit Feedback. Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval, Salvador, Brazil, 2005
    [122] T. Joachims, Unbiased evaluation of retrieval quality using clickthrough data. Technical report, Cornell University, Department of Computer Science , 2002
    [123] Zhang B, H Li, Y Liu, Ji L, Xi W, Fan W, Zheng C, Ma W, Improving Web Search Results Using Affinity Graph. Proceedings of the 28th annual international ACM SIGIR conference on Research and development in information retrieval,2005
    [124] Xie Z. Network Algorithms and Complexity theory. National University of Defense Technology Press.2003
    [125] Matthews, R., Six degrees of separation New Scientist 6 June, 1998
    [126] Sogou查询日志库,http://www.sogou.com/labs/dl/q.html

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700