基于图学习的Web信息检索技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着互联网和万维网(World Wide Web)的快速繁荣发展,万维网逐渐成为人们生活中不可或缺的一种信息获取来源。万维网给信息检索技术带来了极大的机遇和挑战。经过最近十几年的发展,信息检索已经由一个纯粹的学术研究学科转变成大多数人信息获取的技术基础。
     随着Web 2.0概念的普及和发展,万维网不再仅仅是一个巨大的信息库,更逐渐成为一个用户参与和交流的平台。Web 2.0应用网站的蓬勃发展将再次推动信息检索技术的革新。本文认为,在Web 2.0时代,信息检索技术主要有以下三方面的发展趋势:1)更加灵活的个性化信息服务。随着用户的急剧增加,Web 2.0网站迫切需要满足用户的个性化信息需求。然而,传统的Web信息检索技术并不擅长处理Web 2.0应用的复杂结构数据。Web 2.0需要更加灵活的个性化信息服务,如信息推荐系统。2)更加有效的多媒体数据检索技术。随着Web 2.0的普及,用户可以很方便地上传和分享多媒体信息。多媒体数据的迅速增多使得多媒体信息检索技术成为人们关注的焦点。3)检索服务的专业化。当前,Web 2.0应用中的用户产生数据已经成为万维网这个巨大信息库的重要组成部分之一。过于繁杂的Web数据使得Web信息检索向领域化、专业化方向发展。
     很多Web数据呈现复杂的内在关联结构。本文指出,为了更好地解决这些数据上的相关检索问题、提升检索效果,就需要充分利用蕴含在数据复杂关联结构中的知识。图学习技术能够对复杂关联结构进行较好地建模并捕捉其中蕴含的知识。因此,结合上述发展趋势,本文研究工作围绕基于图学习的Web信息检索技术展开,具体在以下四个相关研究问题上进行深入研究并提出了新颖的图学习算法:
     1) Web 2.0社区化标签应用中的个性化标签推荐:社区化标签应用中用户可以对资源任意地加标签。产生的标签标注数据可以很自然地用图来建模。本文提出一种新的基于图的多类关联对象查询排序算法,以解决社区化标签应用中的个性化标签推荐问题。
     2)Web 2.0社区化标签应用中的个性化文档推荐:传统的信息推荐系统聚焦在评级打分数据上,而社区化标签应用中的标签标注数据是一种不同的且具有特殊图结构的数据。本文提出一种新的基于图的多类关联对象降维(语义空间学习)算法,将用户、标签和文档映射到同一语义空间中,然后根据用户与文档之间的欧式距离来进行文档推荐。
     3)人脸图像检索与识别:传统的人脸检索和识别研究利用降维技术(子空间学习)来获得人脸图像的高层次特征表达。最近提出的一种基于图的二阶张量子空间学习算法在人脸图像上表现比较出色,但是其时间复杂度比较高。本文提出一种新的基于图的高效二阶张量子空间学习算法,在保证可接受的检索、识别性能的同时,降低了学习子空间映射函数的时间复杂度。
     4)高质量专业Web资源抓取:聚焦爬虫是从Web上抓取主题相关信息资源的一种重要技术手段。对垂直搜索引擎来讲,最重要的研究问题之一是如何从Web中把高质量的相关资源找出来。本文提出一种新的基于Web图的网页主题质量在线评估算法,并在此基础上设计了一个获取高质量主题相关Web资源的聚焦爬虫。
     文章最后总结了本文工作,并对基于图学习的Web信息检索技术发展前景进行展望。
With the proliferation and evolution of Internet and World Wide Web(WWW), WWW has gradually become an important information source in people's daily life. WWW has brought in new challenges as well as opportunities to the information retrieval technology.In the last decade,Web information retrieval technology has undergone a significant development.Nowadays,information retrieval has changed from an academic discipline to the technical foundation of information acquisition for most people in the world.
     The widespread idea of Web 2.0 has made WWW not only a huge database,but also a platform in which users can participate and communicate with others.The rapid proliferation of Web 2.0 applications will lead to a new round evolution of Web information retrieval technology.This thesis argues that,in the age of Web 2.0,Web information retrieval technology has mainly three evolutionary trends:1) More flexible personalized information services.With rapid increase of users,Web 2.0 Websites pressingly need to satisfy users' personalized information needs.However,traditional Web information retrieval techniques are not expert in dealing with the complex data structures in Web 2.0 applications.Web 2.0 applications need more flexible personalized information services,such as recommender systems.2) More effective multimedia information retrieval techniques.Many Web 2.0 Websites allow users to upload and share multimedia data files,such as pictures and videos.This leads to the rapid growth of multimedia information on the Web.Thus,multimedia information retrieval techniques have become a popular research area.3) Domain or topic specific retrieval.Nowadays,user generated data in Web 2.0 applications has become a significant part of the data of WWW.Huge and topically diverse Web data is forcing Web information retrieval to focus on domain or topic specific retrieval.
     Web data usually have intrinsic complex relational structures.The thesis points out that in order to better address related retrieval problems or improve the retrieval effectiveness on those Web data,we need to exploit these intrinsic complex relational structures.Graph-based learning techniques can properly model these complex relational structures and capture the knowledge contained in them.Thus,considering the evolutionary trends mentioned above,this thesis focuses on graph learning based Web information retrieval.Specific research topics include:
     1) Personalized tag recommendation in social tagging services:in social tagging services users can add tags to resources.Tagging data can be modeled as graphs naturally.This thesis proposes a novel graph-based ranking algorithm for multi-type interrelated objects in order to solve the personalized tag recommendation problem in social tagging services.
     2) Personalized document recommendation in social tagging services:traditional recommender systems focused on rating data,while social tagging data is different from rating data.This thesis proposes a novel graph-based semantic space learning algorithm which projects users,tags and documents iuto the same semantic space. Documents arc recommended to users according to Euclidean distance.
     3) Face image retrieval and recognition:dimension reduction(subspace learning) techniques were used to learn a high level representation for face image retrieval and recognition.Recently a graph-based tensor subspace learning algorithm showed good performance.However,its time complexity is high.This thesis proposes a novel efficient graph-based second order tensor subspace learning algorithm.
     4) Focused crawling for high quality topical Web resources:Focused crawlers are designed for harvesting topical Web pages.For vertical search engines,a key problem is how to find high quality related Web resources.This thesis proposes a novel Web graph based on-line algorithm for estimating Web pages' topical quality and, based on it,designs a focused crawler for harvesting high quality topical resources.
     Finally,the thesis concludes these works and discusses future work on graph learning based Web information retrieval.
引文
[1]C.D.Manning,E Raghavan,H.Sch(u|¨)tze.Introduction to information retrieval.Cambridge Univ Pr,2008.
    [2]第24次中国互联网络发展状况调查统计报告.中国互联网络信息中心,2009.
    [3]D.DiNucci.Fragmented Future.Decision Processes,1999,50(2):179-211.
    [4]G.Adomavicius,A.Tuzhilin.Toward the next generation of recommender systems:A survey of the state-of-the-art and possible extensions.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,2005,17(6):734-749.
    [5]I.Konstas,V.Stathopoulos,J.M.Jose.On social networks and collaborative recommendation.In Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval,2009,pages 195-202.ACM.
    [6]N.N.Liu,Q.Yang.Eigenrank:a ranking-oriented approach to collaborative filtering.In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval,2008,pages 83-90.ACM New York,NY,USA.
    [7]Y.Song,Z.Zhuang,H.Li,Q.Zhao,J.Li.W.C.Lee,C.L.Giles.Real-time automatic tag recommendation.In Proceedings of the 31st annual international A CM SIGIR conference on Research and development in information retrieval,2008,pages 515-522.ACM New York,NY,USA.
    [8]M.S.Lew,N.Sebe,C.Djeraba,R.Jain.Content-based multimedia information retrieval:State of the art and challenges.ACM Transactions on Multimedia Computing,Communications,and Applications(TOMCCAP),2006,2(1):1-19.
    [9]M.Chau,H.Chen,J.Qin,Y.Zhou,Y.Qin,W.K.Sung,D.McDonald.Comparison of two approaches to building a vertical search tool:a case study in the nanotechnology domain.In Proceedings of the 2nd ACM/IEEE-CS joint conference on Digital libraries,2002,pages 135-144.ACM.
    [10]G.Almpanidis,C.Kotropoulos,I.Pitas.Focused crawling using latent semantic indexing - An application for vertical search engines.Lecture Notes in Computer Science(Research and Advanced Technologty for Digital Libraries,September 2005),2005,3652.
    [11]M.Chau,H.Chen.Comparison of three vertical search spiders.Computer,2003,36(5):56-62.
    [12]L.Eikvil,K.Aas,R.B.Huseby.Infon-nation Extraction from World Wide Web-A Survey.Pattern recognition,1999,32:24.
    [13]H.Halpin,V.Robu,H.Shepherd.The complex dynamics of collaborative tagging.In Proceedings of the 16th international conference on World Wide Web,2007,pages 211-220.ACM Press New York,NY,USA.
    [14]S.Brin,L.Page.The Anatomy of a Large-Scale Hypertextual Web Search Engine.In Proceedings of the Seventh International Conference on World Wide Web,1998.pages 107-117.
    [15]X.Li,L.Guo,Y.E.Zhao.Tag-based social interest discovery.In Proceedings of the 17th international conference on World Wide Web,2008.
    [16]B.Markines,C.Cattuto,F.Menczer,D.Benz,A.Hotho,G.Stumme.Evaluating similarity measures for emergent semantics of social tagging.In Proceedings of the 18th international conference on World wide web,2009,pages 641-650.
    [17]C.Wang,L.Zhang,H.J.Zhang.Learning to reduce the semantic gap in web image retrieval and annotation.In Proceedings of the 31st annual international ACM SIGIR confrence on Research and development in information retrieval,2008,pages 355-362.ACM.
    [18]X.He,D.Cai,J.Han.Learning a maximum margin subspace for image retrieval.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,2008,20(2):189-201.
    [19]X.He,W.Y.Ma,H.J.Zhang.Learning an image manifold for retrieval.In Proceedings of the 12th annual ACM international conference on Multimedia,2004,pages 17-23.ACM New York,NY,USA.
    [20]X.He,S.Yah,Y.Hu,P.Niyogi,H.J.Zhang.Face recognition using laplacianfaces.IEEE Transactions on Pattern Analysis and Machine Intelligence,2005, 27(3):328-340.
    [21]S.T.Roweis,L.K.Saul.Nonlinear dimensionality reduction by locally linear embedding.Science,2000,290(5500):2323-2326.
    [22]J.B.Tenenbaum,V.Silva,J.C.Langford.A global geometric framework for nonlinear dimensionality reduction.Science,2000,290(5500):2319-2323.
    [23]M.Belkin,R Niyogi.Laplacian eigenmaps and spectral techniques for embedding and clustering.In Advances in Neural Information Processing Systems 14,2001.
    [24]S.Yan,D.Xu,B.Zhang,H.Zhang,Q.Yang,S.Lin.Graph embedding and extensions:A general framework for dimensionality reduction.1EEE Transactions on Pattern Analysis and Machine Intelligence,2007,29(1 ):40.
    [25]L,Page,S.Brin,R.Motwani,T.Winograd.The Pagerank citation algorithm:bringing order to the web.Technical report,Stanford Digital Library Technologies Project,1998.
    [26]J.M.Kleinberg.Authoritative sources in a hyperlinked environment.Journal of the ACM(JACM),1999,46(5):604-632.
    [27]D.Zhou,J.Weston,A.Gretton,O.Bousquet,B.Scholkopf.Ranking on Data Manifolds.In 18th Annual Conference on Neural Information Processing Systems,2003,pages 169-176.Bradford Book.
    [28]T.Berg,A.Berg,J.Edwards,M.Maire,R.White,Y.W.Teh,E.Learned-Miller,D.Forsyth.Names and faces in the news.In IEEE COMPUTER SOCIETY CONFERENCE ON COMPUTER VISION AND PATTERN RECOGNITION,2004.IEEE Computer Society.
    [29]J.Ruiz-del-Solar,P.Navarrete.FACERET:An Interactive Face Retrieval System Based on Self-Organizing Maps.Lecture notes in computer science,2002:157-164.
    [30]P.Shih,C.Liu.Comparative assessment of content-based face image retrieval in different color spaces,International Journal of Pattern Recognition and Artificial Intelligence,2005,19(7):873-894.
    [31]D.L.Swets,J.Weng.Using discriminant eigenfeatures for image retrieval.IEEE Transactions on Pattern Analysis and Machine intelligence,1996,18(8):831-836.
    [32]X.He,D.Cai,P.Niyogi.Tensor subspace analysis.In Advances in Neural Information Processing Systems 18,2005.
    [33]R.Albert,A.L.Barabasi.Statistical mechanics of complex networks.Reviews of modern physics,2002,74(1 ):47-97.
    [34]T.Coffman,S.Greenblatt,S.Marcus.Graph-based technologies for intelligence analysis.Communications of the A CM,2004,47(3):47.
    [35]J.R.Ullmann.An algorithm for subgraph isomorphism.Journal of the ACM (JACM),1976,23(1):31-42.
    [36]T.Aittokallio,B.Schwikowski.Graph-based methods for analysing networks in cell biology.Briefings in Bioinformatics,2006,7(3):243.
    [37]C.M.Cyr,B.B.Kimia.A similarity-based aspect-graph approach to 3D object recognition,International Journal of Computer Vision,2004,57(1 ):5-22.
    [38]J.W.H.Tangelder,R.C.Veltkamp.A survey of content based 3D shape retrieval methods.Multimedia Tools and Applications,2008,39(3):441-471.
    [39]C.C.Noble,D.J.Cook.Graph-based anomaly detection.In Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining,2003,pages 631-636.ACM New York,NY,USA.
    [40]S.Northcutt.Network intrusion detection:an analvst's handbook.New Riders Publishing Thousand Oaks,CA,USA,1999.
    [41]E.W.Zegura,K.L.Calvert,M.J.Donahoo.A quantitative comparison of graph-based models for Internet topology.IEEE/ACM Transactions on Networking (TON),1997,5(6):770-783.
    [42]S.Agarwal.Ranking on graph data.In Proceedings of the 23rd international conference on Machine learning,2006,pages 25-32.ACM New York,NY,USA.
    [43]T.H.Haveliwala.Topic-sensitive PageRank.In Proceedings of the Eleventh International Conference on World Wide Web,2002,pages 517-526,Honolulu,Hawaii,USA.
    [44]T.H.Haveliwala.Efficient computation of PageRank.Technical report,Stanford Univ.,1999.
    [45]S.Abiteboul,M.Preda,G.Cobena.Adaptive on-line page importance computation.In Proceedings of the Twelfth international Conference on World Wide Web,2003,pages 280-290.
    [46]D.Cai,X.He,J.R.Wen,W.Y.Ma.Block-level link analysis.In Proceedings of the 27th annual international ACM SIGIR conference on Research and development in information retrieval,2004,pages 440-447.ACM New York,NY,USA.
    [47]D.Aldous,J.Fill.Reversible Markov chains and random walks on graphs(Book in preparation).Online version available at http://www.stat.berkeley.edu/users/aldous/RWG/book.html.
    [48]N.Craswell,M.Szummer.Random walks on the click graph.In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval,2007.
    [49]S.Harabagiu,F.Lacatusu,A.Hickl.Answering complex questions with random walk models.In Proceedings of the 29th annual international ACM SIGIR conference on Research and development in information retrieval,2006.
    [50]K.Bharat,M.R.Henzinger.Improved algorithms for topic distillation in a hyperliinked environment.In Proceedings of the 21st Annual International ACM SIGIR Conference,1998,pages 104-111.
    [51]D.Cohn,H.Chang.Learning to probabilistically identify authoritative documents.In Proceedings of the 17th International Conference on Machine Learning,2000,pages 167-174.
    [52]Y.Zheng,L.Zhang,X.Xie,W.Y.Ma.Mining interesting locations and travel sequences from GPS trajectories.In Proceedings of the 18th international conference on World wide web,2009,pages 791-800.ACM New York,NY,USA.
    [53]J.Bian,Y.Liu,D.Zhou,E.Agichtein,H.Zha.Learning to recognize reliable users and content in social media with coupled mutual reinforcement,2009,pages 51-60.ACM New York,NY,USA.
    [54]X.He,E Niyogi.Locality preserving projections.In Advances in Neural Information Processing Systems 16,2003,pages 153-160.Bradford Book.
    [55]D.Zhou,O.Bousquet,T.N.Lal,J.Weston,B.Scholkopf.Learning with Local and Global Consistency.In 18th Annual Conference on Neural Information Processing Systems,2003,pages 237-244.Bradford Book.
    [56]Q.Mei,D.Cai.D.Zhang,C.X.Zhai.Topic modeling with network regularization.In Proceeding of the 17th international conference on World Wide Web,2008.
    [57]L.Cayton.Algorithms for manifold learning.University of California,San Diego,Tech.Rep.CS2008-0923,2005.
    [58]J.Ham,D.Lee,L.Saul.Semisupervised alignment of manifolds.In Proceedings of the Annual Conference on Uncertainty in Artificial Intelligence,2005,pages 120-127.
    [59]R.O.Duda,P.E.Hart,D.G.Stork.Pattern Classification.Wiley-Interscience,2000.
    [60]M.Turk,A.Pentland.Eigenfaces for recognition.Journal of cognitive neuroscience,1991,3(1):71-86.
    [61]K.R.Muller,S.Mika,G.Ratsch,K.Tsuda,B.Scholkopf.An introduction to kernel-based learning algorithms.IEEE transactions on neural networks,2001,12(2):181-201.
    [62]J.Ye.Generalized low rank approximations of matrices.In Proceedings of the twenty-first international conference on Machine learning,2004.
    [63]H.L(u|¨)tkepohl.Handbook of matrices.Wiley,1996.
    [64]P.N.Belhumeur,J.P.Hespanha,D.J.Kriegman.Eigenfaces vs.Fisherfaces:recognition using class specific linearprojection.IEEE Transactions on Pattern Analysis and Machine Intelligence,1997,19(7):711-720.
    [65]D.J.Hand.Kernel Discriminant Analysis.Research Studies Press,1982.
    [66]J.Ye,R.Janardan,Q.Li.Two-dimensional linear discriminant analysis.In Advances in Neural Information Processing Systems 17,2004.
    [67]S.A.Golder,B.A.Huberman.Usage patterns of collaborative tagging systems.Journal of Information Science,2006,32(2):198.
    [68]S.Sen,S.K.Lam,D.Cosley,D.Frankowski,J.Osterhouse,F.M.Harper,J.Riedl.tagging,communities,vocabulary,evolution.In Proceedings of the ACM 2006conference on Computer supported cooperative work,2006,pages 181-190.ACM New York,NY,USA.
    [69]C.H.Brooks,N.Montanez.Improved annotation of the blogosphere via autotagging and hierarchical clustering.In Proceedings of the 15th international conference on World Wide Web,2006,pages 625-632.ACM New York,NY,USA.
    [70]T.Rattenbury,N.Good,M.Naaman.Towards automatic extraction of event and place semantics from flickr tags.In Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval,2007,pages 103-110.ACM Press New York,NY,USA.
    [71]S.Bao,G.Xue,X.Wu,Y.Yu,B.Fei,Z.Su.Optimizing web search using social annotations.In Proceedings of the 16th international conference on World Wide Web,2007,pages 501-510.ACM Press New York,NY,USA.
    [72]S.Xu,S.Bao,Y.Cao,Y.Yu.Using social annotations to improve language model for information retrieval.In Proceedings of the sixteenth ACM conference on Conference on information and knowledge management,2007,pages 1003-1006.ACM New York,NY,USA.
    [73]D.Zhou,J.Bian,S.Zheng,H.Zha,C.L.Giles.Exploring social annotations for information retrieval.In Proceedings of the 17th international conference on World Wide Web,2008.
    [74]G.Koutrika,F.A.Effendi,Z.Gy6ngyi,R Heymann,H.Garcia-Molina.Combating spam in tagging systems.In Proceedings of the 3rd international workshop on Adversarial information retrieval on the web,2007,pages 57-64.ACM Press New York,NY,USA.
    [75]B.Sigurbj(o|¨)rnsson,R.van Zwol.Flickr tag recommendation based on collective knowledge.In Proceedings of the 17th international conference on World Wide Web,2008.
    [76]Z.Xu,Y.Fu,J.Mao,D.Su.Towards the semantic web:Collaborative tag suggestions.In Collaborative Web Tagging Workshop at WWW2006,Edinburgh,Scotland,May,2006.
    [77]N.Garg,I.Weber.Personalized,interactive tag recommendation for flickr,2008,pages 67-74.ACM New York,NY,USA.
    [78]U.von Luxburg.A tutorial on spectral clustering.Statistics and Computing,2007, 17(4):395-416.
    [79]J.Li,H.Zha.Two-way Poisson mixture models for simultaneous document classification and word clustering.Computational Statistics and Data Analysis,2006,50(1):163-180.
    [80]G.Adomavicius,A.Tuzhilin.Toward the Next Generation of Recommender Systems:A Survey of the State-of-the-Art and Possible Extensions.IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING,2005:734-749.
    [81]L.von Ahn,L.Dabbish.Labeling images with a computer game.In Proceedings of the SIGCHI conference on Human factors in computing systems,2004,pages 319-326.ACM Press New York,NY,USA.
    [82]Q.Mei,D.Zhou,K.Church.Query suggestion using hitting time.In Proceedings of the 17th ACM International Conference on Information and Knowledge Management,2008.
    [83]J.T.Sun,H.J.Zcng,H.Liu,Y.Lu,Z.Chen.CubeSVD:a novel approach to personalized Web search.In Proceedings of the 14th international conference on World Wide Web,2005,pages 382-390.ACM New York,NY,USA.
    [84]R.Baeza-Yates,B.Ribeiro-Neto.Modern information retrieval.Addision Wesley,1999.
    [85]S.P.Boyd,L.Vandenberghe.Convex optimization.Cambridge Univ Press,2004.
    [86]G.H.Golub,C.F.Van Loan.Matrix computation.The Johns Hopkins University Press Baltimore,MD,1989.
    [87]A.Knutson,T.Tao.Honeycombs and sums of Hermitian matrices.NOTICES-AMERICAN MATHEMATICAL SOCIETY,2001,48(2):175-186.
    [88]G.W.Stewart.Matrix algorithms.Society for Industrial and Applied Mathematics,1998.
    [89]J.Dean,S.Ghemawat.MapReduce:Simplified data processing on large clusters.In Proceedings of OSDI '04:6th Symposium on Operating System Design and Implementation,2004.
    [90]Z.Gy(o|¨)ngyi,H.Garcia-Molina.Web spare taxonomy.In First International Workshop on Adversarial Information Retrieval on the Web,2005.
    [91]G.W.Corder,D.I.Foreman.Nonparametric statistics for non-statisticians:A step-by-step approach.Wiley-Blackwell,2009.
    [92]J.I.Marden.Analyzing and modeling rank data.Chapman & Hall,1995.
    [93]P.Resnick,N.Iacovou,M.Suchak,P.Bergstrom,J.Riedl.GroupLens:an open architecture for collaborative filtering of netnews.In Proceedings of the 1994 ACM conference on Computer supported cooperative work,1994.
    [94]U.Shardanand,P.Maes.Social information filtering:algorithms for automating 'Word of mouth'.In Proceedings of the SIGCHI conference on Human factors in computing systems,1995,pages 210-217.
    [95]G.Karypis.Evaluation of item-based top-n recommendation algorithms.In Proceedings of the tenth international conference on Information and knowledge management,2001,pages 247-254.ACM New York,NY,USA.
    [96]S.Xu,S.Bao,B.Fei,Z.Su,Y.Yu.Exploring folksonomy for personalized search.In Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval,2008,pages 155-162.ACM New York,NY,USA.
    [97]P.Heymann,D.Ramage,H.Garcia-Molina.Social Tag Prediction.In Proceedings of the 31st Annual International ACM Special Interest Group on Information Retrieval(SIGIR'08) Conference,2008.
    [98]K.H.L.Tso-Sutter,L.B.Marinho,L.Schmidt-Thieme.Tag-aware recommender systems by fusion of collaborative filtering algorithms.In Proceedings of the 2008ACM symposium on Applied computing,2008,pages 1995-1999.ACM New York,NY,USA.
    [99]R.Y.Nakamoto,S.Nakajima,J.Miyazaki,S.Uemura,H.Kato,Y.Inagaki.Reasonable tag-based collaborative filtering for social tagging systems.In Proceeding of the 2nd ACM workshop on Information credibility on the web,2008.
    [100]J.Diederich,T.Iofciu.Finding communities of practice from user profiles based on folksonomies.In Proceedings of the 1st International Workshop on Building Technology.Enhanced Learning solutions for Communities of Practice,2006.
    [101]A.Shepitsen,J.Gemmell,B.Mobasher,R.Burke.Personalized recommendation in social tagging systems using hierarchical clustering.In Proceedings of the 2008ACM conference on Recommender systems,2008,pages 259-266.ACM New York,NY.USA.
    [102]S.Sen,J.Vig,J.Riedl.Tagommenders:connecting users to items through tags.In Proceedings of the 18th international conference on World wide web,2009,pages 671-680.ACM New York,NY,USA.
    [103]F.R.K.Chung.Spectral graph theory.American Mathematical Society,1997.
    [104]Y.Bengio,J.F.Paiement,P.Vincent,O.Delalleau,N.Le Roux,M.Ouimet.Out-of-sample extensions for LLE,Isomap,MDS,Eigenmaps,and Spectral clustering.In Advances in Neural Information Processing Systems 16,2003.The MIT Press.
    [105]S.Funk.Try This At Home.http.//sifter:org/~simon/journal/20061211.html.2006.
    [106]J.J.Rocchio.Relevance feedback in information retrieval.In The SMART retrieval system:experiments in automatic document processing,1971.Prentice Hall.
    [107]R.K.Pon,A.F.Cardenas,D.Buttler,T.Critchlow.Tracking multiple topics for finding interesting articles.In Proceedings of the 13th ACM SIGKDD international conference on Knowledge discovery and data mining,2007,pages 560-569.ACM New York,NY,USA.
    [108]J.Han,M.Kamber.Data mining:concepts and techniques.Morgan Kaufmann,2006.
    [109]O.Arandjclovic,A.Zisserman.Automatic face recognition for fihn charactcr retrieval in feature-length films.In IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2005.
    [110]J.Yang,D.Zhang,A.F.Frangi.Two-dimensional PCA:a new approach to appearance-based face representation and recognition.IEEE Transactions on Pattern Analysis and Machine Intelligence,2004,26(1 ):131 - 137.
    [111]R.Chellappa,C.L.Wilson,S.Sirohey.Human and machine recognition of faces:A survey.Proceedings of the IEEE,1995,83(5):705-740.
    [112]A.Samal,P.A.lyengar.Automatic recognition and analysis of human faces and facial expressions:A survey.Pattern recognition,1992,25(1 ):65-77.
    [113]W.Zhao,R.Chellappa,P.J.Phillips,A.Rosenfeld.Face recognition:A literature survey.Acm Computing Surveys(CSUR),2003,35(4):399-458.
    [114]Y.Chang,C.Hu,M.Turk.Manifold of facial expression.In IEEE International Workshop on Analysis and Modeling of Faces and Gestures,2003.
    [115]K.C.Lee,J.Ho,M.H.Yang,D.Kriegman.Video-based face recognition using probabilistic appearance manifolds.In IEEE Conference on Computer Vision and Pattern Recognition,2003,pages 313-320.
    [116]A.Shashua,A.Levin,S.Avidan.Manifold Pursuit:A New Approach to Appearance Based Recognition.In Proceedings of the 16 th International Conference on Pattern Recognition(ICPR'02),2002.
    [117]W.Zhao,P.J.Phillips.Subspace linear discriminant analysis for face recognition.IEEE Trans.on Image Processing,1999.
    [118]K.Jia,S.Gong.Multi-modal tensor face for simultaneous super-resolution and recognition.In Tenth IEEE International Conference on Computer Vision,2005.
    [119]M.A.O.Vasilescu,D.Terzopoulos.Multilinear subspace analysis of image ensembles.In IEEE Conference on Computer Vision and Pattern Recognition,2003.
    [120]L.De Lathauwer,B.De Moor,J.Vandewalle.A multilinear singular value decomposition.SIAM Journal on Matrix Analysis and Applications,2000,21(4):1253-1278.
    [121]D.Cai,X.He,J.Han.Spectral regression for efficient regularized subspace learning.In Proc.Int.Conf.Computer Vision(ICCV'07),2007.
    [122]E.B.Davies.Heat kernels and spectral theory.Cambridge University Press,1989.
    [123]G.H.Golub,C.F.Van Loan.Matrix computations.Johns Hopkins University Press,1996.
    [124]A.Bjorck.Numerics of gram-schmidt orthogonalization.Linear Algebra and Its Applications,1994,197(198):297-316.
    [125]A.E.Hoerl,R.W.Kennard.Ridge regression:Biased estimation for nonorthogonal problems.Technometrics,2000,42(1):80-86.
    [126]W.Xu,X.Liu,Y.Gong.Document clustering based on non-negative matrix factorization.In Proceedings of the 26th annual international ACM SIGIR conference on Research and development in informaion retrieval,2003,pages 267-273.
    [127]L.Lovasz,M.D.Plummer.Matching theory.North Holland,1986.
    [128]X.He,D.Cai,H.Liu,J.Han.Image clustering with tensor representation.In Praceedings of tke 13th annual ACM international conference on Multimedia,2005,pages 132-140.
    [129]S.Chakrabarti.Mining the Web.Morgan Kaufmann Publishers,2003.
    [130]B.Pinkerton.Finding what people want:Experiences with the webcrawler.In Proceedings of the Second International World Wide Web Conference,1994.
    [131]J.Cho,H.Garcia-Molina.Parallel crawlers.In Proceedings of the Eleventh International Conference on World Wide Web,2002,pages 124-135,Honolulu,Hawaii,USA.
    [132]S.Raghavan,H.Garcia-Molina.Crawling the hidden web.In Proceedings of the 27th International Conference on Very Large Databases(VLDB),2001,pages 129-138,Rome,Italy.
    [133]J.Cho,H.Garcia-Molina.Effective page refresh policies for web crawlers.ACM Transactions on Database Systems(TODS),2003,28(4):390-426.
    [134]J.Cho,H.Garcia-Molina,L.Page.Efficient crawling through URL ordering.In Proceedings of the Seventh International Conference on World Wide Web,1998.
    [135]M.Najork,J.L.Wiener.Breadth-first crawling yields high-quality pages.In Proceedings of the Tenth International Conference on World Wide Web,2001,pages 114-118,Hong Kong.
    [136]S.Chakrabarti,M.van den Berg,B.Dora.Focused crawling:A new approach to topic-specific Web resource discover.Computer Networks,1999,31(11-16):1623-1640.
    [137]M.Diligenti,F.M.Coetzee,S.Lawrence,C.L.Giles,M.Gori.Focused crawling using context graphs.In Proceedings of the 26th International Conference on Very Large Databases(VLDB),2000,pages 527-534,Cairo,Egypt.
    [138]F.Menczer,G.Pant,P.Srinivasan,M.E.Ruiz.Evaluating topic-driven web crawlers.In Proceedings of the 24th Annual International ACM SIGIR Conference, 2001,pages 241-249.
    [139]G.Pant,P.Srinivasan.Learning to crawl:Comparing classification schemes.ACM Trans.Information Systems,2005,23(4).
    [140]G.Pant,P.Srinivasan.Link Contexts in Classifier-Guided Topical Crawlers.IEEE Trans.Knowledge and Data Engineering,2006,18(1).
    [141]A.K.McCallum,K.Nigam,J.Rennie,K.Seymore.Automating the Construction of Internet Portals with Machine Learning.Information Retrieval,2000,3(2):127-163.
    [142]J.Qin,Y.Zhou,M.Chau.Building domain-specific web collections for scientific digital libraries:a meta-search enhanced focused crawling method.In Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries,2004,Tuscon,AZ,USA.
    [143]G.Pant,F.Menczer.Topical crawling for business intelligence.In Proc.7th European Conference on Research and Advanced Technology for Digital Libraries (ECDL 2003),2003.
    [144]P.Calado,M.Cristo,E.Moura,N.Ziviani,B.Ribeiro-Neto,M.A.Goncalves.Combining Link-Based and Content-Based Methods for Web Document Classification.In Proceedings of the 12th International Conference on Information and Knowledge Management,2003,pages 394-401,New Orleans,USA.
    [145]T.T.Tang,D.Hawking,N.Craswell,K.Griffiths.Focused crawling for both topical relevance and quality of medical information.In Proceedings of the 14th ACM international conference on Information and knowledge management,2005,pages 147-154,Bremen,Germany.ACM Press.
    [146]A.Ntoulas,M.Najork,M.Manasse,D.Fetterly.Detecting spam web pages through content analysis.In Proceedings of the 15th international conference on World Wide Web,2006,Edinburgh,Scotland.
    [147]R.Albert,H.Jeong,A.-L.Barabasi.The diameter of the world wide web.Nature,1999,401:130.
    [148]R.Baeza-Yates,C.Castillo,M.Marin,A.Rodriguez.Crawling a Country:Better Strategies than BreadthFirst for Web Page Ordering.In Proceedings of the Fourteenth International Conference on World Wide Web,2001,pages 864-872,Chiba,Japan.
    [149]I.Silva,B.Ribeiro-Neto,P.Calado,E.Moura,N.Ziviani.Link-based and content-based evidential information in a belief network model.In Proceedings of the 23rd Annual International A CM SIGIR Conference,2000,pages 96-103.
    [150]K.Yang.Combining text- and link-based retrieval methods for Web IR.In Proceedings of the 10th Text Retrieval Conference(TREC-10),2001,pages 609-618,New Orleans,LA.
    [151]C.Elkan.Boosting and naive bayesian learning.In International Conference on Knowledge Discovety in Databases,1997.
    [152]A.K.Jain,J.Mao,K.M.Mohiuddin.Artificial neural networks:A tutorial.Computer,1996,29(3):31-44.
    [153]C.J.C.Burges.A tutorial on support vector machines for pattern rccognition.Data Mining and Knowledge Discovery,1998,2(2):121 - 167.
    [154]S.Chakrabarti,K.Punera,M.Subramanyam.Accelerated focused crawling through online relevance feedback.In Proceedings of the 11th international conference on World Wide Web,2002,pages 148-159.
    [155]J.Rennie,A.McCallum.Using reinforcement learning to spider the Web efficiently.In Proceedings of the 16th International Conference on Machine Learning (ICML99),1999,pages 335-343.
    [156]C.C.Aggarwal,F.Al-Garawi,P.S.Yu.Intelligent Crawling on the World Wide Web with Arbitrary Predicates.In Proceedings of the Tenth International Conference on World Wide Web,2001,pages 96-105,Hong Kong.
    [157]M.Ester,H.P.Kriegel,M.Schubert.Accurate and efficient crawling for relevant websites.In Proceedings of the Thirtieth international conference on Very large data bases-Volume 30,2004,pages 396-407.
    [158]C.C.Aggarwal.Collaborative Crawling:Mining User Experiences for Topical Resource Discovery.In Proceedings of the Eighth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining,2002,pages 423-428.
    [159]G.W.Flake.K.Tsioutsiouliklis,L.Zhukov.Methods for mining Web communities:Bibliometric,spectral and flow.In Web Dynamics:adapting to change in content,size,topology and use,2004.Springer Verlag.
    [160]G.W.Flake,S.Lawrence,C.L.Giles.Efficient identification of Web communities.In Proceedings of the sixth ACM SIGKDD international conference on Knowledge discovery and data mining,2000,pages 150-160.
    [161]G.Pant,P.Srinivasan,F.Menczer.Exploration versus exploitation in topic driven crawlers.In Proceedings of the 11th World Wide Web Workshop on Web Dynamics,2002.
    [162]M.Porter.An algorithm for suffix stripping.Program,1980,14(3):130-137.
    [163]P.Srinivasan,F.Menczer,G.Pant.A general evaluation framework for topical crawlers.Information Retrieval,2005,8(3):417-447.
    [164]S.Agarwal,K.Branson,S.Belongie.Higher order learning with graphs.In Proceedings of the 23rd international conference on Machine learning,2006.ACM.
    [165]D.Zhou,J.Huang,B.Scholkopf.Learning with hypergraphs:Clustering,classification,and embedding.In Advances in Neural Information Processing Systems 19,2006.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700