异构信息网络检索技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
现实世界中各种信息对象和它周围的信息对象都在不同方面、不同层次,以不同方式相互影响、相互作用着,从而组成了复杂的信息网络。信息网络不仅能帮助我们更好的表达和存储现实世界中的本质信息,而且通过对信息网络中的联接信息进行分析,它可以作为一种挖掘现实世界中隐藏信息的有用工具。因此,从信息网络中挖掘信息获取知识已成为当前的研究热点之一。本文在分析了信息网络尤其是异构信息网络的研究现状的基础上,通过分析信息文档及其相关对象的关系构建异构信息网络,研究了半监督学习、文档聚类、检索结果聚类标签抽取以及查询推荐等信息检索中的关键技术。论文的主要研究工作和创新点如下:
     (1)提出了针对查询和文档的内容特征以及点击关系构造异构信息网络及半监督学习的框架。根据查询和文档自身内容特征分别构造基于特征的相似图,同时基于查询和文档之间的点击关系构建查询-文档二部图,并引入标记样本的判别信息强化网络结构。提出了查询-文档异构信息网络上半监督学习的正则化框架和标记传播算法。在给出少量标签的情况下,本文方法能更充分的利用查询和文档本身的内容信息,并借助于相互之间的关系互相传播,实验表明本文方法优于传统的半监督学习方法比较。
     (2)为包含多种类型和联系的高阶异构信息网络建立了图正则化的半监督学习框架。在该框架中,使用图正则化区分了不同类型联系的语义,提出了一种能充分保留标记样本和未标记样本共同揭示的空间结构的光滑性的代价函数,并得到了该代价函数的闭式解。提出了高阶异构信息网络上的标记传播算法,标记信息从标记节点不断向邻近节点传播直至稳定状态,证明了标记传播算法将收敛于代价函数的闭式解。在该框架之下,一些经典的半监督学习算法可以作为其特例存在。
     (3)针对查询-文档富文本异构信息网络提出了两种不同的主题传播模型:TP-TS和TP-Unify。TP-TS把主题建模和随机漫步看成是两个独立的过程,首先通过潜在概率主题分析(PLSA)对文本内容构建主题模型,然后主题信息在异构的查询-文档二部图互相传播,从而揭示不同节点的主题并进行类别划分。TP-Unify把异构信息网络上异构节点之间的一致性约束引入主题分析,在进行主题建模的同时结合了网络结构分析技术。
     (4)提出了一种新的类别标签抽取的方法,其基本思想是把类别标签抽取转化为与类簇相关的查询词的排序问题,从而避免了从网页文档簇中抽取主题词的操作。提出了一种融合查询-网页点击图、网页相似图以及链接图对查询词和网页进行联合排序的算法,该算法能有效的整合用户、网页创建者和网页写作者对网页的评价。
     (5)把基于日志分析和基于语义分析的查询推荐技术结合起来,通过构造Term-Query-URL异构信息网络同时分析日志信息及语义信息,采用基于查询的重启动随机游走进行查询推荐。借助于点击日志进行协同推荐,在高频查询上能取得很好的效果,采用基于文档的方法训练词汇和查询词之间的语义关系,可以提高稀疏查询的推荐效果。在大规模商业搜索引擎查询日志上的实验表明本文方法优于现有的查询推荐方法。
Heterogeneous information networks, composed of multiple types of objects andlinks, are ubiquitous in real life. It turns out that this level of abstraction has greatpower in not only representing and storing the essential information about the realworld, but also providing a useful tool to mine knowledge from it, by exploring thepower of links. Therefore, effective analysis of large-scale heterogeneous informationnetworks has recently attracted substantial interest. Following discussion on thedevelopment history and research of heterogeneous information networks, thisdissertation focus on some key topics in information retrieval by constructingheterogeneous information networks, i.e. semi-supervise learning, document clustering,cluster description and query suggestion. The main results and contributions of thisdissertation are as follows.
     (1) We consider The semi-supervised classification problem on query-documentheterogeneous information network which incorporate the bipartite graph with thecontent information from both sides. In order to strengthen the network structure, weintroduce class information of sample nodes. We investigate semi-supervised learningalgorithm based on two frameworks, including the graph-based regularizationframework and the iterative framework. In the regularization framework, we develop acost function to consider the direct relationship between two entity sets and the contentinformation from both sides, which leads to a significant improvement over thebaseline methods.
     (2) The semi-supervised classification problem on heterogeneous informationnetworks with an arbitrary schema consisting of a number of object and link types isconsidered in this paper. By applying graph regularization to preserve consistency overeach relation graph corresponding to each type of links separately, a classifyingfunction is developed which is sufficiently smooth with respect to the intrinsicstructure collectively revealed by known labeled and unlabeled points. an iterativeframework on heterogeneous information network is proposed in which theinformation of labeled data can be spread to the adjacent nodes by iterative methoduntil the steady state. The class memberships of unlabeled data can be inferred fromthose of labeled ones according to their proximities in the network. Some classicsemi-supervised learning algorithm can be used as a special case of the algorithm.
     (3) Two different topic propagation models: TP-TS and TP-Unify are proposedfor rich-text query-document heterogeneous information network. TP-TS consider thetopic modeling and random walk process are combined as two independent stages,PLSA provides a simplified solution to model topics of documents and queries, thenthe topic information propagate on the query-document bipartite graph. TP-Unifyinvestigate a joint regularization framework to directly incorporate heterogeneousinformation network into topic modeling by regularizing a statistical topic model, theimprovement over TP-TS owes to the direct optimization of the heterogeneousinformation analysis and topic modeling in a unified regularization framework.
     (4) A new method of extracting the category label was proposed, the basic idea isto convert cluster description into query rank in cluster, thus avoiding extractkeywords from web documents. We presented a rank algorithm which combination ofquery-document click graph, document affinity graph and web link graph, which caneffectively integrate evaluation of user, web pages creator and web page writers.
     (5) A Term-Query bipartite graph was trained by extracting semantic relationshipsfrom snippet clicked by query. With the combination of Query-URL graph andQuery-Flow graph, a heterogeneous Term-Query-URL information network wasconstructed. Random walk with restart (RWR) was performed on the informationnetwork for query suggestion. The relevance of long tail query suggestion can begreatly improved by taking account of semantic information and log information. Termvector of query was constructed based on probabilistic language model for querysuggestion of new query. The experimental results clearly show that our approachoutperforms three baseline methods.
引文
[1] Han J, Sun Y, Yan X, et al. Mining knowledge from databases: an informationnetwork analysis approach. Proceedings of the2010ACM SIGMOD InternationalConference on Management of data. ACM,2010:1251-1252
    [2] Fiala D. Mining citation information from CiteSeer data. Scientometrics,2011,86(3):553-562
    [3] Licatalosi D D, Darnell R B. RNA processing and its regulation: global insightsinto biological networks. Nature Reviews Genetics,2010,11(1):75-87
    [4] Page L, Brin S, Motwani R, et al. The PageRank citation ranking: bringing orderto the web.1999
    [5] Cai D, Shao Z, He X. Community mining from multi-relational networks. InProceedings of the9th European Conference on Principles and Practice ofKnowledge Discovery in Databases.2005
    [6] Sun Y, Tang J, Han J, et al. Community evolution detection in dynamicheterogeneous information networks. Proceedings of the Eighth Workshop onMining and Learning with Graphs. ACM,2010:137-146
    [7] Sun Y, Yu Y, Han J. Ranking-based clustering of heterogeneous informationnetworks with star network schema. Proceedings of the15th ACM SIGKDDinternational conference on Knowledge discovery and data mining. ACM,2009:797-806
    [8] Deng H, King I, Lyu M R. Formal models for expert finding on DBLPbibliography data. Data Mining,2008. ICDM'08. Eighth IEEE InternationalConference on. IEEE,2008:163-172
    [9] Deng H, King I, Lyu M R. Enhanced models for expertise retrieval usingcommunity-aware strategies. Systems, Man, and Cybernetics, Part B: Cybernetics,IEEE Transactions on,2012,42(1):93-106
    [10]于景元.钱学森关于开放的复杂巨系统的研究.系统工程理论与实践,1992,12(5):8-12
    [11] Sun Y, Han J. Mining heterogeneous information networks: a structural analysisapproach. ACM SIGKDD Explorations Newsletter,2013,14(2):20-28
    [12]蒋永新,叶元芳,图书馆学,等.现代科技信息检索与利用.上海大学出版社,1999
    [13]丁国栋,白硕,王斌.文本检索的统计语言建模方法综述.计算机研究与发展,2006,43(5):769-776
    [14] Gao B, Liu T Y, Feng G, et al. Hierarchical taxonomy preparation for textcategorization using consistent bipartite spectral graph copartitioning. Knowledgeand Data Engineering, IEEE Transactions on,2005,17(9):1263-1273
    [15] Sun Y, Han J, Zhao P, et al. Rankclus: integrating clustering with ranking forheterogeneous information network analysis. Proceedings of the12thInternational Conference on Extending Database Technology: Advances inDatabase Technology. ACM,2009:565-576
    [16] Zhou Y, Cheng H, Yu J X. Graph clustering based on structural/attributesimilarities. Proceedings of the VLDB Endowment,2009,2(1):718-729
    [17] Zhou Y, Cheng H, Yu J X. Clustering large attributed graphs: An efficientincremental approach. Data Mining (ICDM),2010IEEE10th InternationalConference on. IEEE,2010:689-698
    [18] Deng H, Han J, Zhao B, et al. Probabilistic topic models with biased propagationon heterogeneous information networks. Proceedings of the17th ACM SIGKDDinternational conference on Knowledge discovery and data mining. ACM,2011:1271-1279
    [19] Sinha S N, Pollefeys M. Multi-view reconstruction using photo-consistency andexact silhouette constraints: A maximum-flow formulation. Computer Vision,2005. ICCV2005. Tenth IEEE International Conference on. IEEE,2005,1:349-356
    [20]周志华,王珏等.机器学习及其应用:2007.清华大学出版社,2007
    [21] Wang R, Shi C, Philip S Y, et al. Integrating Clustering and Ranking on HybridHeterogeneous Information Network. Advances in Knowledge Discovery andData Mining. Springer Berlin Heidelberg,2013:583-594
    [22] Ling Y, Ye C. Fast Co-clustering Using Matrix Decomposition. InformationProcessing,2009. APCIP2009. Asia-Pacific Conference on. IEEE,2009,2:201-204
    [23]Ling Y, Ye C Y, Wei G Y. Fast Co-clustering on large datasets using matrixdecomposition. International Journal of Intelligent Information TechnologyApplication,2010,3(2):85-91
    [24] Gao B, Liu T Y, Ma W Y. Star-structured high-order heterogeneous dataco-clustering based on consistent information theory. Data Mining,2006.ICDM'06. Sixth International Conference on. IEEE,2006:880-884
    [25] Sun J T, Zeng H J, Liu H, et al. CubeSVD: a novel approach to personalized Websearch. Proceedings of the14th international conference on World Wide Web.ACM,2005:382-390
    [26] Leung K W T, Lee D L. Dynamic agglomerative-divisive clustering ofclickthrough data for collaborative web search. Database Systems for AdvancedApplications. Springer Berlin Heidelberg,2010:635-642
    [27] Cao L, Jin X, Yin Z, et al. RankCompete: Simultaneous ranking and clustering ofinformation networks. Neurocomputing,2012,95:98-104
    [28] I.S.Dhillon. Co-clustering documents and words using bipartite spectral graphpartitioning. Proceedings of the7th ACM International Conference on KnowledgeDiscovery and Data Mining, San Francisco, CA, USA,2001.269~274
    [29] Dhillon I S, Mallela S, Modha D S. Information-theoretic co-clustering.Proceedings of the ninth ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM,2003:89-98
    [30] Wang X, Sun J T, Chen Z, et al. Latent semantic analysis for multiple-typeinterrelated data objects. Proceedings of the29th annual international ACMSIGIR conference on Research and development in information retrieval. ACM,2006:236-243
    [31] B.Gao,T.Y.Liu,Z.Xin,Q.S.Cheng and W.Y.Ma.Consistent bipartite graphco-partitioning for star-structured high-order heterogeneous data co-clustering.Proceedings of the11th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining.2005,41~50
    [32] Long B, Zhang Z M, Wu X, et al. Spectral clustering for multi-type relational data.Proceedings of the23rd international conference on Machine learning. ACM,2006:585-592
    [33] Alzate C, Suykens J A K. Multiway spectral clustering with out-of-sampleextensions through weighted kernel PCA. Pattern Analysis and MachineIntelligence, IEEE Transactions on,2010,32(2):335-347
    [34] Seung D, Lee L. Algorithms for non-negative matrix factorization. Advances inneural information processing systems,2001,13:556-562
    [35] Chen Y, Wang L, Dong M. Non-negative matrix factorization for semisupervisedheterogeneous data coclustering. Knowledge and Data Engineering, IEEETransactions on,2010,22(10):1459-1474
    [36] Schmidt M N, Winther O, Hansen L K. Bayesian non-negative matrixfactorization. Independent Component Analysis and Signal Separation. SpringerBerlin Heidelberg,2009:540-547
    [37] Guan N, Huang X, Lan L, et al. Graph Based Semi-supervised Non-negativeMatrix Factorization for Document Clustering. Machine Learning andApplications (ICMLA),201211th International Conference on. IEEE,2012,1:404-408
    [38] Cai D, He X, Wu X, et al. Non-negative matrix factorization on manifold. DataMining,2008. ICDM'08. Eighth IEEE International Conference on. IEEE,2008:63-72
    [39]吴湖,王永吉,王哲,等.两阶段联合聚类协同过滤算法.软件学报,2010,21(5):1042-1054
    [40] Koh S M, Chia L T. Web image clustering with reduced keywords and weightedbipartite spectral graph partitioning. Advances in Multimedia InformationProcessing-PCM2006. Springer Berlin Heidelberg,2006:880-889
    [41] Xu G, Zong Y, Dolog P, et al. Co-clustering analysis of weblogs using bipartitespectral projection approach. Knowledge-Based and Intelligent Information andEngineering Systems. Springer Berlin Heidelberg,2010:398-407
    [42] Wieling M, Nerbonne J. Hierarchical spectral partitioning of bipartite graphs tocluster dialects and identify distinguishing features. Proceedings of the2010Workshop on Graph-based Methods for Natural Language Processing.Association for Computational Linguistics,2010:33-41
    [43] Wieling M, Nerbonne J. Bipartite spectral graph partitioning for clustering dialectvarieties and detecting their linguistic features. Computer Speech&Language,2011,25(3):700-715
    [44] Dhillon IS, Mallela S, Modha DS. Information-Theoretic co-clustering. In: GetoorL, ed. Proc. of the9th ACM SIGKDD. New York: ACM Press,2003.89-98
    [45] Banerjee A, Dhillon I, Ghosh J, Merugu S, Modha DS. A generalized maximumentropy approach to Bregman co-clustering and matrixapproximation. JournalofMachine LearningResearch,2007,8(8):1919-1986
    [46] Agarwal D, Merugu S. Predictive discrete latent factor models for large scaledyadicdata. In: Berkhin P, ed. Proc. of the SIGKDD. New York:ACM Press,2007.26-35
    [47] Rege M, Dong M, Hua J. Graph theoretical framework for simultaneouslyintegrating visual and textual features for efficient web image clustering.Proceedings of the17th international conference on World Wide Web. ACM,2008:317-326
    [48] Gao B, Liu T Y, Zheng X, et al. Consistent bipartite graph co-partitioning forstar-structured high-order heterogeneous data co-clustering. Proceedings of theeleventh ACM SIGKDD international conference on Knowledge discovery in datamining. ACM,2005:41-50
    [49] Zhuang Y, Chiu D K W, Jiang N, et al. Personalized Clustering for Social ImageSearch Results Based on Integration of Multiple Features. Advanced Data Miningand Applications. Springer Berlin Heidelberg,2012:78-90
    [50] Hindle A, Shao J, Lin D, et al. Clustering web video search results based onintegration of multiple features. World Wide Web,2011,14(1):53-73
    [51] Yang C, Peng J, Feng X, et al. Integrating bilingual search results for automaticjunk image filtering. Multimedia Tools and Applications,2012:1-28
    [52] Han J. Mining heterogeneous information networks by exploring the power oflinks. Discovery Science. Springer Berlin Heidelberg,2009:13-30
    [53] Wang R, Shi C, Philip S Y, et al. Integrating Clustering and Ranking on HybridHeterogeneous Information Network. Advances in Knowledge Discovery andData Mining. Springer Berlin Heidelberg,2013:583-594
    [54] Wu F, Han Y H, Zhuang Y T. Multiple hypergraph clustering of web images byminingword2image correlations. Journal of Computer Science and Technology,2010,25(4):750-760
    [55]吴飞,韩亚洪,庄越挺,等.图像-文本相关性挖掘的Web图像聚类方法.软件学报,2010,21(7):1561-1575
    [56] Zhou Y, Cheng H, Yu J X. Graph clustering based on structural/attributesimilarities. Proceedings of the VLDB Endowment,2009,2(1):718-729
    [57] Zhou Y, Cheng H, Yu J X. Clustering large attributed graphs: An efficientincremental approach. Data Mining (ICDM),2010IEEE10th InternationalConference on. IEEE,2010:689-698
    [58] Joachims T. Transductive inference for text classification using support vectormachines. ICML.1999,99:200-209
    [59] Scudder III H. Probability of error of some adaptive pattern-recognition machines.Information Theory, IEEE Transactions on,1965,11(3):363-371
    [60] Larsen J, Have A S, Hansen L K. Probabilistic hierarchical clustering with labeledand unlabeled data. International Journal of Knowledge-Based and IntelligentEngineering Systems,2002,6(1):56-62
    [61] Nigam K, McCallum A K, Thrun S, et al. Text classification from labeled andunlabeled documents using EM. Machine learning,2000,39(2-3):103-134
    [62] Joachims T. Transductive inference for text classification using support vectormachines. ICML.1999,99:200-209
    [63] Bennett K, Demiriz A. Semi-supervised support vector machines. Advances inNeural Information processing systems,1999:368-374
    [64] Belkin M, Niyogi P, Sindhwani V. Manifold Regulation: A Geometric Frameworkfor Learning from examples. Technical Report, Uni. Chicago,2004
    [65] Zhou Z H, Xu J M. On the relation between multi-instance learning andsemi-supervised learning. Proceedings of the24th international conference onMachine learning. ACM,2007:1167-1174
    [66] Zhu X, Lafferty J. Harmonic mixtures: combining mixture models andgraph-based methods for inductive and scalable semi-supervisedlearning.Proceedings of the22nd international conference on Machine learning.ACM,2005:1052-1059
    [67] D. Zhou, O. Bousquet, T. N. Lal, J. Weston, B. Sch lkopf. Learning with localand global consistency. In: S. Thrun, L. Saul, B. Sch lkopf, eds. Advances inNeural Information Processing Systems16, Cambridge, MA: MIT Press,2004,321-328
    [68] Blum A, Chawla S. Learning from labeled and unlabeled data using graph mincuts.2001
    [69] M. Belkin, P. Niyogi, V. Sindwani. On manifold regularization. In: Proceedingsof the10th International Workshop on Artificial Intelligence and Statistics(AISTATS’05), Savannah Hotel, Barbados,2005,17-24
    [70] M. Szummer and T. Jaakkola. Partially labeled classification with Markov randomwalks. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances inNeural Information Processing Systems14, Cambridge, MA:MIT Press,2002
    [71] Wang F, Zhang C. Label propagation through linear neighborhoods. Knowledgeand Data Engineering, IEEE Transactions on,2008,20(1):55-67
    [72] Ando R K, Zhang T. Learning on graph with Laplacian regularization. Advancesin neural information processing systems,2007,19:25
    [73] Joachims T. Transductive learning via spectral graph partitioning. ICML.2003,3:290-297
    [74] Blum A, Mitchell T. Combining labeled and unlabeled data with co-training.Proceedings of the eleventh annual conference on Computational learning theory.ACM,1998:92-100
    [75] Nigam K, Ghani R. Analyzing the effectiveness and applicability of co-training.Proceedings of the ninth international conference on Information and knowledgemanagement. ACM,2000:86-93
    [76] Muslea I, Minton S, Knoblock C A. Active+semi-supervised learning=robustmulti-view learning. ICML.2002,2:435-442
    [77] Zhou Z H, Li M. Tri-training: Exploiting unlabeled data using three classifiers.Knowledge and Data Engineering, IEEE Transactions on,2005,17(11):1529-1541
    [78] Wang J, Luo S, Zeng X. A random subspace method for co-training. NeuralNetworks, IEEE World Congress on Computational Intelligence,2008:195-200
    [79] M. Szummer and T. Jaakkola. Partially labeled classification with Markov randomwalks. In T. G. Dietterich, S. Becker, and Z. Ghahramani, editors, Advances inNeural Information Processing Systems14, Cambridge, MA:MIT Press,2002
    [80]潘俊.基于图的半监督学习及其应用研究.浙江大学,2011
    [81] X. Zhu, Z. Ghahramani, J. Lafferty. Semi-supervised learning using Gaussianfields and harmonic functions. In: Proceedings of the20th InternationalConference on Machine Learning (ICML’03), Washington, DC,2003,912-919
    [82] Belkin M, Niyogi P. Semi-supervised learning on Riemannian manifolds. Machinelearning,2004,56(1-3):209-239
    [83] Belkin M, Niyogi P, Sindhwani V. Manifold regularization: A geometricframework for learning from labeled and unlabeled examples. The Journal ofMachine Learning Research,2006,7:2399-2434
    [84] Zhao L, Luo S, Zhao Y, et al. Regularized semi-supervised classification onmanifold. Advances in Knowledge Discovery and Data Mining. Springer BerlinHeidelberg,2006:20-29
    [85] Sindhwani V, Niyogi P, Belkin M. Beyond the point cloud: from transductive tosemi-supervised learning. Proceedings of the22nd international conference onMachine learning. ACM,2005:824-831
    [86] Smola A J, Kondor R. Kernels and regularization on graphs. Learning theory andkernel machines. Springer Berlin Heidelberg,2003:144-158
    [87]王娇.多视图的半监督学习研究.北京交通大学,2010
    [88] Nigam K, Ghani R. Analyzing the effectiveness and applicability of co-training.Proceedings of the ninth international conference on Information and knowledgemanagement. ACM,2000:86-93
    [89]Zhou Z H, Li M. Tri-training: Exploiting unlabeled data using three classifiers.Knowledge and Data Engineering, IEEE Transactions on,2005,17(11):1529-1541
    [90] Zhai C, Lafferty J. Model-based feedback in the language modeling approach toinformation retrieval. Proceedings of the tenth international conference onInformation and knowledge management. ACM,2001:403-410
    [91] Berger A, Lafferty J. Information retrieval as statistical translation. Proceedings ofthe22nd annual international ACM SIGIR conference on Research anddevelopment in information retrieval. ACM,1999:222-229
    [92] Brin S, Page L. The anatomy of a large-scale hypertextual Web searchengine.Computer networks and ISDN systems,1998,30(1):107-117
    [93] Kleinberg J M. Authoritative sources in a hyperlinked environment. Journal of theACM (JACM),1999,46(5):604-632
    [94] Feng G, Liu T Y, Wang Y, et al. AggregateRank: Bringing order to web sites.Proceedings of the29th annual international ACM SIGIR conference on Researchand development in information retrieval. ACM,2006:75-82
    [95] Cai D, He X, Wen J R, et al. Block-level link analysis. Proceedings of the27thannual international ACM SIGIR conference on Research and development ininformation retrieval. ACM,2004:440-447
    [96] Nie L, Davison B D, Wu B. Ranking by community relevance. Proceedings of the30th annual international ACM SIGIR conference on Research and developmentin information retrieval. ACM,2007:873-874
    [97] Fersini E, Messina E, Archetti F. Granular modeling of web documents: Impact oninformation retrieval systems. Proceedings of the10th ACM workshop on Webinformation and data management. ACM,2008:111-118
    [98] Qi X, Nie L, Davison B D. Measuring similarity to detect qualified links.Proceedings of the3rd international workshop on Adversarial informationretrieval on the web. ACM,2007:49-56
    [99] Zhang X, Fan X, Liu X. A Ranking Algorithm via Changing Markov ProbabilityMatrix Based on Distribution Factor. Fuzzy Systems and Knowledge Discovery,2008. FSKD'08. Fifth International Conference on. IEEE,2008,5:3-7
    [100] Zhang Y, Xiao L, Fan B. The Research about Web Page Ranking Based on theA-PageRank and the Extended VSM. Fuzzy Systems and Knowledge Discovery,2008. FSKD'08. Fifth International Conference on. IEEE,2008,4:223-227
    [101] Liu Y, Gao B, Liu T Y, et al. BrowseRank: letting web users vote for pageimportance. Proceedings of the31st annual international ACM SIGIR conferenceon Research and development in information retrieval. ACM,2008:451-458
    [102]刘玉婷.网页排序中的随机模型及算法.中国科学:数学,2011,41(12):1095-1103
    [103] Li L, Shang Y, Zhang W. Improvement of HITS-based algorithms on webdocuments. Proceedings of the11th international conference on World Wide Web.ACM,2002:527-535
    [104] Zhang X, Liu X, Zhang L, et al. G-HITS: A Link Analysis Algorithm Based onGravitation Model. Data, Privacy, and E-Commerce,2007. ISDPE2007. The FirstInternational Symposium on. IEEE,2007:149-151
    [105] Tsaparas P. Using non-linear dynamical systems for web searching and ranking.Proceedings of the twenty-third ACM SIGMOD-SIGACT-SIGART symposium onPrinciples of database systems. ACM,2004:59-70
    [106] Nie Z, Zhang Y, Wen J R, et al. Object-level ranking: bringing order to webobjects. Proceedings of the14th international conference on World Wide Web.ACM,2005:567-574
    [107] Deng H, Lyu M R, King I. A generalized Co-HITS algorithm and its applicationto bipartite graphs. Proceedings of the15th ACM SIGKDD internationalconference on Knowledge discovery and data mining. ACM,2009:239-248
    [108] Haveliwala T, Kamvar S, Jeh G. An analytical comparison of approaches topersonalizing PageRank.2003
    [109] Huang H, Zubiaga A, Ji H, et al. Tweet Ranking Based on HeterogeneousNetworks. COLING.2012:1239-1256
    [110] Ng M K P, Li X, Ye Y. MultiRank: co-ranking for objects and relations inmulti-relational data. Proceedings of the17th ACM SIGKDD internationalconference on Knowledge discovery and data mining. ACM,2011:1217-1225
    [111] Wu W, Li H, Xu J. Learning query and document similarities from click-throughbipartite graph with metadata. Proceedings of the sixth ACM internationalconference on Web search and data mining. ACM,2013:687-696
    [112] Chen Y, Wang L, Dong M. Non-negative matrix factorization for semisupervisedheterogeneous data coclustering. Knowledge and Data Engineering, IEEETransactions on,2010,22(10):1459-1474
    [113] Z.H. Zhou. Co-training Paradigm in Semi-supervised Learning. In: Proceedingsof the Chinese Workshop on Machine Learning and Applications, Nanjing, China,2007
    [114] Johnson R, Zhang T. Graph-based semi-supervised learning and spectral kerneldesign. Information Theory, IEEE Transactions on,2008,54(1):275-288
    [115] Jones R, Klinkner K L. Beyond the session timeout: automatic hierarchicalsegmentation of search topics in query logs. Proceedings of the17th ACMconference on Information and knowledge management. ACM,2008:699-708
    [116] Lucchese C, Orlando S, Perego R, et al. Identifying task-based sessions in searchengine query logs. Proceedings of the fourth ACM international conference onWeb search and data mining. ACM,2011:277-286
    [117] Boldi P, Bonchi F, Castillo C, et al. The query-flow graph: model andapplications. Proceedings of the17th ACM conference on Information andknowledge management. ACM,2008:609-618
    [118] Boldi P, Bonchi F, Castillo C, et al. Query suggestions using query-flowgraphs.Proceedings of the2009workshop on Web Search Click Data. ACM,2009:56-63
    [119] Pan J, Kong F S, Wang R Q. Locality sensitive discriminant transductivelearning. Journal of Zhejiang University. Engineering Science,2012,46(6):987-994
    [120] Chen X, Chen S, Xue H, et al. A unified dimensionality reduction framework forsemi-paired and semi-supervised multi-view data. Pattern Recognition,2012,45(5):2005-2018
    [121] Raz R. On the complexity of matrix product. SIAM Journal on Computing,2003,32(5):1356-1369
    [122] Strehl A, Ghosh J. Cluster ensembles---a knowledge reuse framework forcombining multiple partitions. The Journal of Machine Learning Research,2003,3:583-617
    [123] Han J. Mining heterogeneous information networks: the next frontier.Proceedings of the18th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM,2012:2-3
    [124] Gao J, Liang F, Fan W, et al. A graph-based consensus maximization approachfor combining multiple supervised and unsupervised models.2013
    [125] Chiang M F, Liou J J, Wang J L, et al. Exploring heterogeneous informationnetworks and random walk with restart for academic search. Knowledge andInformation Systems,2013:1-24
    [126] Kamvar K, Sepandar S, Klein K, et al. Spectral learning. International JointConference of Artificial Intelligence. Stanford InfoLab,2003
    [127] Chen W, Feng G. Spectral clustering: a semi-supervised approach.Neurocomputing,2012,77(1):229-242
    [128] Ji M, Sun Y, Danilevsky M, et al. Graph regularized transductive classificationon heterogeneous information networks. Machine Learning and KnowledgeDiscovery in Databases. Springer Berlin Heidelberg,2010:570-586
    [129] Q. Mei, D. Cai, D. Zhang, and C. Zhai. Topic modeling with networkregularization. InWWW, pages101–110,2008
    [130] D. Cai, X. Wang, and X. He. Probabilistic dyadic data analysis with local andglobal consistency. InICML, page14,2009
    [131] Hofmann T. Probabilistic latent semantic analysis. Proceedings of the Fifteenthconference on Uncertainty in artificial intelligence. Morgan Kaufmann PublishersInc.,1999:289-296
    [132] Hofmann T. Unsupervised learning by probabilistic latent semantic analysis.Machine learning,2001,42(1-2):177-196
    [133] Hebert T, Leahy R. A generalized EM algorithm for3-D Bayesian reconstructionfrom Poisson data using Gibbs priors. Medical Imaging, IEEE Transactions on,1989,8(2):194-202
    [134]张刚,刘悦,郭嘉丰,等.一种层次化的检索结果聚类方法.计算机研究与发展,2008,45(3):542-547
    [135]骆雄武,万小军,杨建武,等.基于后缀树的Web检索结果聚类标签生成方法.中文信息学报,2009,23(2):83-88
    [136] Maqbool O, Babri H A. Interpreting clustering results through cluster labeling.Emerging Technologies,2005. Proceedings of the IEEE Symposium on. IEEE,2005:429-434
    [137] Mei Q, Shen X, Zhai C X. Automatic labeling of multinomial topic models.Proceedings of the13th ACM SIGKDD international conference on Knowledgediscovery and data mining. ACM,2007:490-499
    [138] Maqbool O, Babri H A. Interpreting clustering results through cluster labeling.Emerging Technologies,2005. Proceedings of the IEEE Symposium on. IEEE,2005:429-434
    [139] Weiss D. Descriptive clustering as a method for exploring text collections.University of Technology,2006.7-56
    [140] Li H, Shen D, Zhang B, et al. Adding semantics to email clustering. Data Mining,2006. ICDM'06. Sixth International Conference on. IEEE,2006:938-942
    [141] D.R. Cutting, J.O. Pedersen, D.R. Karger, and J.W. Tukey. Scatter/gather: Acluster-based approach to browsing large document collections.In Proceedings ofthe ACM SIGIR, Copenhagen,1992:318–329
    [142] Anton V. Leouski&CroftW. B. An Evaluation of Techniques for ClusteringSearch Results. Technical Report IR-76, Department of Computer Science,University of Massachusetts, Amherst,1996:1-19
    [143] Osinski S, Weiss D. Conceptual Clustering Using Lingo Algorithm: Evaluationon Open Directory Project Data. Intelligent information processing and Webmining: proceedings of the International IIS. Springer Press.2004:369~377
    [144] Zamir O, Etzioni O. Web document clustering: A feasibility demonstration.Proceedings of the21st annual international ACM SIGIR conference on Researchand development in information retrieval. ACM,1998:46-54
    [145] Janruang J, Guha S. Semantic suffix tree clustering. First IRAST InternationalConference on Data Engineering and Internet Technology, DEIT.2011
    [146] Crabtree D, Gao X, Andreae P. Improving web clustering by cluster selection.Web Intelligence,2005. Proceedings. The2005IEEE/WIC/ACM InternationalConference on. IEEE,2005:172-178
    [147] Wang J, Mo Y, Huang B, et al. Web search results clustering based on a novelsuffix tree structure. Autonomic and Trusted Computing. Springer BerlinHeidelberg,2008:540-554
    [148]Kopidaki S, Papadakos P, Tzitzikas Y. STC+and NM-STC: Two novel onlineresults clustering methods for web searching. Web Information SystemsEngineering-WISE2009. Springer Berlin Heidelberg,2009:523-537
    [149] Mecca G, Raunich S, Pappalardo A. A new algorithm for clustering search results.Data&Knowledge Engineering,2007,62(3):504-522
    [150] Wu B, Davison B D. Identifying link farm spam pages. Special interest tracksand posters of the14th international conference on World Wide Web. ACM,2005:820-829
    [151]陈毅恒,秦兵,刘挺,等.基于潜在语义索引和自组织映射网的检索结果聚类方法.计算机研究与发展,2009,46(7):1176-1183
    [152] Mei Q, Zhou D, Church K. Query suggestion using hitting time. Proceedings ofthe17th ACM conference on Information and knowledge management. ACM,2008:469-478
    [153] Song Y, He L. Optimal rare query suggestion with implicit user feedback.Proceedings of the19th international conference on World wide web. ACM,2010:901-910
    [154] Song Y, Zhou D, He L. Post-ranking query suggestion by diversifying searchresults. Proceedings of the34th international ACM SIGIR conference on Researchand development in Information Retrieval. ACM,2011:815-824
    [155] Boldi P, Bonchi F, Castillo C, et al. From Dango to Japanese Cakes: QueryReformulation Models and Patterns. Proceedings of the2009IEEE/WIC/ACMInternational Joint Conference on Web Intelligence and Intelligent AgentTechnology-Volume01. IEEE Computer Society,2009:183-190
    [156] Kato M P, Sakai T, Tanaka K. Query session data vs clickthrough data as querysuggestion resources. Carterette et al.[2].2011
    [157] Kang C, Lin X, Wang X, et al. Modeling Perceived Relevance for Tail Querieswithout Click-Through Data. arXiv preprint arXiv:1110.1112,2011
    [158] Liu Y, Miao J, Zhang M, et al. How do users describe their information need:Query recommendation based on snippet click model. Expert Systems withApplications,2011,38(11):13847-13856
    [159] Xue X, Croft W B, Smith D A. Modeling reformulation using passage analysis.Proceedings of the19th ACM international conference on Information andknowledge management. ACM,2010:1497-1500
    [160] Craswell N, Billerbeck B, Fetterly D, et al. Robust query rewriting using anchordata. Proceedings of the sixth ACM international conference on Web search anddata mining. ACM,2013:335-344
    [161] Liao Z, Jiang D, Chen E, et al. Mining concept sequences from large-scalesearch logs for context-aware query suggestion. ACM Transactions on IntelligentSystems and Technology (TIST),2011,3(1):17
    [162] Song Y, Zhou D, He L. Query suggestion by constructing term-transition graphs.Proceedings of the fifth ACM international conference on Web search and datamining. ACM,2012:353-362
    [163]白露,郭嘉丰,曹雷,等.基于查询意图的长尾查询推荐.计算机学报,2013,36(3):636-642
    [164] Zhai C X. A note on the expectation-maximization (em) algorithm. Course noteof CS410,2004
    [165] Bhatia S, Majumdar D, Mitra P. Query suggestions in the absence of query logs.Proceedings of the34th international ACM SIGIR conference on Research anddevelopment in Information Retrieval. ACM,2011:795-804
    [166] Ozertem U, Chapelle O, Donmez P, et al. Learning to suggest: a machinelearning framework for ranking query suggestions. Proceedings of the35thinternational ACM SIGIR conference on Research and development ininformation retrieval. ACM,2012:25-34

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700