异构信息网络分析模型及其应用研究

英文题名：Research on Heterogeneous Information Network Analysis Model and Application
作者：李朋
论文级别：博士
学科专业名称：计算机科学与技术
中文关键词：异构信息网络 ; 网络分析 ; 个性化查询 ; 聚类分析 ; 活动预测
英文关键词：heterogeneous information network ; network analysis ; clustering ; personal query ; activity prediction
学位年度：2013
导师：文俊浩
学科代码：0812
学位授予单位：重庆大学
论文提交日期：2013-05-01

摘要

随着信息数据类别的多样化和数据关系的复杂化，信息网络正在向异构化方向发展。因此，如何借助网络分析的手段，从异构信息网络中挖掘出有用知识是信息检索和知识挖掘面临的新课题。在异构信息网络中，参与知识挖掘的关键元素主要包括数据、服务和人类活动。上述元素中，以关系型数据库为代表的数据存储方式为海量信息提供了结构化的数据管理模式；以Web服务为代表的功能提供方式为构建公开化、松耦合的信息平台奠定了基础；以微博为代表的社交网络活动形式提供了新型的数据共享和信息交互方式。随着数据类别的多样化、服务访问的频繁化以及社交活动的网络化，人们对个性化的数据查询、聚类分析、活动预测等需求与日俱增，因此，对异构信息网络分析模型及其在信息检索和知识挖掘中的应用研究具有理论及现实工程意义。
     针对异构信息网络发展趋势及面临的新课题，基于异构信息网络中异构对象关系挖掘与异构信息网络描述模型，研究了异构信息网络中节点排序函数；基于描述模型和排序函数，结合Web服务异构网络、关系型数据库元组网络与社交网络，研究了异构信息网络分析模型的新型聚类分析、排序以及活动预测方法。
     论文研究的主要工作包括：
     ①结合信息网络异构化发展趋势，基于对聚类、个性化查询与社交网络预测等研究现状及存在问题的分析，借助形式化方法研究了异构信息网络的描述模型。
     ②基于异构信息网络描述模型，提出了基于异构信息网络分析的排序方法。根据不同网络连接形式和排序规则，该排序方法定义了4种不同类型的排序函数。不同排序函数的实例分析对比研究表明，该排序方法可为网络分析提供基础数据排序方法支撑。
     ③鉴于以属性为计算依据的聚类不支持异构数据、忽略数据排序等问题，从关系的维度出发，提出了基于异构信息网络分析的聚类算法。基于该聚类算法，以Web服务聚类为例，提出了基于异构服务网络分析的服务聚类算法SNTClus。SNTClus算法基于服务标签等各参与方对象及关系构建异构服务网络描述模型，基于服务排序模型构建聚类多维度量模型，借助网络划分和排序循环迭代方法实现Web服务聚类。以Titan服务集为数据集的实验分析结果表明，SNTClus算法的服务聚类时间开销代价低、聚类准确度高。
     ④针对当前信息查询中个性化支持程度低等问题，提出了基于异构信息网络分析的个性化查询方法。该方法研究以关系型数据库为例，针对当前关系型数据库个性化top-k查询要求，提出了基于异构元组网络分析的关系型数据库排序方法RNRank。RNRank排序方法基于异构元组网络提取和异构元组网络关联分析构建元组排序模型，按照是否考虑数据类别属性分别提出单类别数据元组排序算法RNRank-I和多类别条件下基于聚类分析的数据元组排序算法RNRank-II。以IMDB电影数据集和German Credit信息卡数据集为测试数据库的实验分析结果表明，RNRank算法具有较高的数据排序效率和排序准确度。
     ⑤为了实现对社交网络活动的预测，提出了基于异构信息网络分析的社交网络活动预测模型。该模型基于不同网络结构特性的4种信息活动预测方法，通过相对准确率对比，采用不同加权系数构建综合预测公式，实现对社交网络消息转发次数和可能浏览次数的预测。文中以新浪微博实际数据为例，验证了基于异构信息网络分析的预测模型用于社交网络活动预测的可行性和准确性。
In information network, a new heterogeneous trend comes true with thedevelopment of information diversification and complex relationships betweeninformation objects. Faced on the heterogeneous network structure, how to find theuseful knowledge and to improve the utilization of information based on networkanalysis is one of most urgent problems. In the heterogeneous information world, data,services, and human activities are the key engagement elements consisting of the usageof information. Relational database model can provide structured storage andmanagement format for the mass of information data; Web services as the representativefunction package and development technology can construct open, loosely coupledinformation platform; Social network provides the open platform for informationsharing and dissemination. With the diversification of data categories, frequent access toweb services and the rapid development of the social network, there is a growingdemand for information sorting and knowledge mining from the heterogeneous forms ofdata network, service network, and social network.
     In this paper, with the analysis of current information network and heterogeneousinformation network, faced on the problems of current researches, we mined therelationship between the heterogeneous objects in heterogeneous information network,studied of heterogeneous information network description model deeply, studied theheterogeneous information network description model based ranking functions whichconsider the linkage analysis of network structure. From the dimension of therelationship, the new heterogeneous information network analysis model basedclustering analysis, ranking and activity prediction methods are proposed. As the casestudy, we finished some basic researches and provide specific solutions for problems ofservices heterogeneous network, relational database tuple networks, social network.
     The details of research works in this paper include:
     ①We analyzed the trends and heterogeneous characteristics of currentinformation network development, analyzed current situation and existing problems inclustering, personal query and social network prediction, Studied the format descriptionof heterogeneous information network description model.
     ②Based on the heterogeneous information network description model, several network analysis based ranking functions are studied, considering the different forms ofnetwork connectivity and ranking rules; Through the comparison and analysis ofranking results, provide method support for network analysis based ranking.
     ③Faced on the problems of property computing based clustering researches, formthe view of relationships we proposed a novel clustering algorithm based onheterogeneous information network analysis, and studied the basic idea and process ofthe proposed clustering algorithm; as special case of heterogeneous informationnetwork applications, in order to solve the problems of web services clustering, weproposed a new clustering algorithm based on service tags considered network structureand heterogeneous service network analysis, co-considering the clustering and rankingprocess of services, ranking model provides compute vectors for clustering process andfinish the ranking results in different service clusters. In order to evaluate theperformance and accuracy, we designed experiments with the true web services datasetfrom Titan.
     ④In order to improve the personalized query support of in information searchesand queries, we proposed a personalized query method based on heterogeneousinformation network analysis which mines the possible ranking results considering thehidden categories. As the case study of specific network, we provided a new rankingalgorithm for personal top-k query in relational database which analyzes the foreignkeys linked tuple relations and schemas, study the extraction method of heterogeneoustuple network. Based on the relational tuples network structure, researched the rankingmodel and process of proposed algorithm. The ranking algorithm of relational databasecan classified as single category ranking and ranking consider multi-classes in whichthe latter ranking should consider the potential categories hidden behind data. Theexperiment analysis part chooses real databases from IMDB and German Credit toevaluate the proposed ranking algorithm on performance and accuracy.
     ⑤Based on the means of heterogeneous information network analysis, combinedwith the social network activity forecast demand, we proposed social network activitiesprediction model based on heterogeneous information network analysis. Consideringthe analysis of network structure and messages propagation in social network, weproposed four different prediction models for message re-tweet and possible view times.Combining the proposed four prediction models with different weights based on accuracies of prediction, we constructed a composite prediction model. The experimentswith real Weibo data give an evaluation on available and accuracy of research.

引文

[1] C. Beath, I. Becerra-Fernandez, et al.. Finding Value in the Information Explosion [J]. MitSloan Management Review.2012,53(4):18-20.
    [2] M. Kitsuregawa, and T. Nishida. Special Issue on Information Explosion Preface [J]. NewGeneration Computing.2010,28(3):207-215.
    [3] S. Wasserman. Network science: an introduction to recent statistical approaches [C]. InProceedings of KDD.2009,9-10.
    [4] A. Barabási Linked: The New Science of Networks [J]. In Proceedings of J. ArtificialSocieties and Social Simulation.2003.
    [5] L. Getoor, Link Mining and Link Discovery [J]. In Proceedings of J. Sensor Technology.2010,606-609.
    [6] L.A.F. Park and K. Ramamohanarao, Multiresolution Web Link Analysis Using GeneralizedLink Relations [J]. In Proceedings of IEEE Trans. Knowledge Data Eng.2011,1691-1703.
    [7]姜峰,范玉顺.基于扩展概念格的Web关系挖掘[J].软件学报,2010,21(10):2432-2444.
    [8] H. Ji, H. B. Deng, et al.. Uncertainty Reduction for Knowledge Discovery and InformationExtraction on the World Wide Web [C]. Proceedings of the Ieee.2012,100(9):2658-2674.
    [9] M. Usman, R. Pears, et al.. A data mining approach to knowledge discovery frommultidimensional cube structures [J]. Knowledge-Based Systems.2013,40:36-49.
    [10] S. Cavuoti, M. Brescia, et al. Data mining and Knowledge Discovery Resources forAstronomy in the Web2.0Age [J]. Software and Cyberinfrastructure for Astronomy Ii.2012,8451.
    [11] A.Z. Broder, R. Kumar, F. Maghoul, P. Raghavan, S. Rajagopalan, R. Stata, A. Tomkins, andJ.L. Wiener, Graph structure in the Web [J]. In Proceedings of Computer Networks.2000,309-320.
    [12] G. Qi, M. Tsai, S. Tsai, L. Cao, and T.S. Huang, Web-Scale Multimedia Information Networks[J]. In Proceedings of Proceedings of the IEEE.2012,2688-2704.
    [13] D. Taniar, High Performance Database Processing [J]. In Proceedings of AINA.2012,5-6.
    [14] J. ERICKSON, K. SIAU. Web services, service-oriented computing, and service-orientedarchitecture: Separating hype from reality [J]. Journal of Database Management,2008,19(3):42-54.
    [15]岳昆,王晓玲,周傲英.Web服务核心支撑技术:研究综述[J].软件学报,2004,15(3):428-442
    [16] A.C. Weaver and B.B. Morrison, Social Networking [J]. In Proceedings of IEEE Computer.2008,97-100.
    [17]国家科学基金会.互联网发展报告[EB/OL].http://www.nsf.gov/.2013.
    [18] J. Han, Y. Sun, X. Yan, and P.S. Yu, Mining knowledge from databases: an informationnetwork analysis approach [C]. In Proceedings of SIGMOD Conference.2010,1251-1252.
    [19] N. Pernelle and F. Sa s, Classification rule learning for data linking [C]. In Proceedings ofEDBT/ICDT Workshops.2012,136-139.
    [20] A. Bronselaer and G.D. Tré, Concept-relational text clustering [J]. In Proceedings of Int. J.Intell. Syst.2012,970-993.
    [21] B. Long, Z. Zhang, and P.S. Yu, A probabilistic framework for relational clustering [C]. InProceedings of KDD.2007,470-479.
    [22] R. Lichtenwalter, J.T. Lussier, and N.V. Chawla, New perspectives and methods in linkprediction [C]. In Proceedings of KDD.2010,243-252.
    [23] D. Liben-Nowell and J.M. Kleinberg, The link-prediction problem for social networks [C]. InProceedings of JASIST.2007,1019-1031.
    [24] S. Brin and L. Page, The Anatomy of a Large-Scale Hyper-textual Web Search Engine[J]. In Proceedings of Computer Networks.1998,107-117.
    [25] Z. Bar-Yossef and L. Mashiach, Local approximation of pagerank and reverse pagerank [C].In Proceedings of CIKM.2008,279-288.
    [26] Y. Sun, J. Han, P. Zhao, Z. Yin, H. Cheng, and T. Wu, RankClus: integrating clustering withranking for heterogeneous information network analysis [C]. In Proceedings of EDBT.2009,565-576.
    [27] G. Alonso, F. Casati, H.A. Kuno, and V. Machiraju, Web Services-Concepts, Architecturesand Applications [J]. In Proceedings of Data-Centric Systems and Applications.2004.
    [28] G. Castagna, N. Gesbert, and L. Padovani, A theory of contracts for Web services [J]. InProceedings of ACM Trans. Program. Lang. System.2009.
    [29] Y.E. Lien, Hierarchical Schemata for Relational Databases [J]. In Proceedings of ACM Trans.Database System.1981,48-69.
    [30]陶岳,何震瀛,张家琪,关系数据库上基于元组组合的关键字查询[J].计算机研究与发展.2011,10:1890-189.
    [31]中文互联网数据资讯中心,社交网络发展趋势分析[EB/OL]. http://www.199it.com/archives/category/social-network.2013.
    [32] S. Unankard, L. Chen, P. Li, S. Wang, Z. Huang, M.A. Sharaf, and X. Li, On thePrediction of Re-tweeting Activities in Social Networks-A Report on WISE2012Challenge[C]. In Proceedings of WISE.2012,744-754.
    [33] Z. Nie, Y. Zhang, J. Wen, and W. Ma, Object-level ranking: bringing order to Web objects [C].In Proceedings of WWW.2005,567-574.
    [34] Y. Sun, Y. Yu, and J. Han, Ranking-based clustering of heterogeneous information networkswith star network schema [C]. In Proceedings of KDD.2009,797-806.
    [35] A. Varadharajalu, W. Liu, and W. Wong, Author Name Disambiguation for Ranking andClustering PubMed Data Using NetClus [C]. In Proceedings of Australasian Conference onArtificial Intelligence.2011,152-161.
    [36] J. Han,M. Kamber著,范明,孟小峰译,数据挖掘概念与技术[M].北京:机械工业出版社,2007.
    [37] S. Na, L.Xumin, G. Yong, Research on k-means clustering algorithm: an improved k-meansclustering algorithm [C]. Intelligent Information Technology and Security Informatics(IITSI).2010,63-67.
    [38] Y. Ledeneva et al. EM clustering algorithm for automatic text summarization [J].Advances inArtificial Intelligence. Springer Berlin Heidelberg.2011,305-315.
    [39] L. Guo, H. Chen, G. Yang, and R. Fei, A QoS Evaluation Algorithm for Web Service RankingBased on Artificial Neural Network [C]. In Proceedings of CSSE.2008,381-384.
    [40] V. Cardellini, V.D. Valerio, V. Grassi, S. Iannucci, and F.L. Presti, A new approach to QoSdriven service selection in service oriented architectures [C], In Proc. SOSE,2011,102-113.
    [41] L. Vu, M. Hauswirth, and K. Aberer, QoS-Based Service Selection and Ranking with Trustand Reputation Management [C], In Proc. OTM Conferences (1),2005,466-483.
    [42] I. Toma, D. Roman, D. Fensel, B. Sapkota, and J.M. Gómez, A Multi-criteria ServiceRanking Approach Based on Non-Functional Properties Rules Evaluation [C], in Proc.ICSOC,2007, pp.435-441.
    [43] D.A. D'Mello and V.S. Ananthanarayana, Quality Driven Web Service Selection and Ranking[C], In Proc. ITNG,2008, pp.1175-1176.
    [44] H. Wang, P. Tong, and P. Thompson, QoS-Based Web Services Selection [C], In Proc. ICEBE,2007, pp.631-637.
    [45] A. Alhosban, K. Hashmi, Z. Malik, and B. Medjahed, S2R: A Semantic Web serviceSimilarity and Ranking Approach [C]. In Proceedings of IJNGC.2012.
    [46] U. Bellur and H. Vadodaria, Web Service Ranking Using Semantic Profile Information [C], InProc. ICWS,2009, pp.872-879.
    [47] K. Rasch, F. Li, S. Sehic, R. Ayani, and S. Dustdar, Context-driven personalized servicediscovery in pervasive environments [C], World Wide Web,2011, pp.295-319.
    [48] H. Wang and W. Liu, Web Service Selection with Quantitative and Qualitative UserPreferences [C]. In Proceedings of Web Intelligence.2011,404-411.
    [49] V.X. Tran, H. Tsuji, and R. Masuda, A new QoS ontology and its QoS-based rankingalgorithm for Web services [J], Simulation Modelling Practice and Theory,2009,pp.1378-1398.
    [50] L. Lu, Chen J., and Zhu G., Semantic Matching Method for Personalized Service Discovery[J]. International Journal of Nonlinear Sciences and Numerical Simulation,2012,7(3):317-320.
    [51] D. Mobedpour and C. Ding, User-centered design of a QoS-based web service selectionsystem [J]. Service Oriented Computing and Applications,2011, pp.1-11.
    [52] Q. Liang, P. Li, P.C.K. Hung, and X. Wu, Clustering Web Services for AutomaticCategorization [C]. In Proceedings of IEEE SCC.2009,380-387.
    [53] A. Azari, L. Zhou, and A. Gangopadhyay, Clustered service rank in support of web servicediscovery [C]. In Proceedings of iConference.2012,508-509.
    [54] D. Skoutas, D. Sacharidis, A. Simitsis, and T.K. Sellis, Ranking and Clustering WebServices Using Multicriteria Dominance Relationships [J]. In Proceedings of IEEE T.Services Computing.2010,163-177.
    [55] K. Elgazzar, A.E. Hassan, P. Martin, Clustering WSDL Documents to Bootstrap the Discoveryof Web Services [C], In ICWS,2010,147-154
    [56] W. Liu, W. Wong, Web service clustering using text mining techniques [C], IJAOSE,2009,6-26.
    [57] C. Platzer, F. Rosenberg, and S. Dustdar, Web service clustering using multidimensionalangles as proximity measures [J]. In Proceedings of ACM Trans. Internet Techn..2009.
    [58] L. Chen, L. Hu, Z. Zheng, J. Wu, J. Yin, Y. Li, and S. Deng, WTCluster: Utilizing Tags forWeb Services Clustering [C]. In Proceedings of ICSOC.2011,204-218.
    [59] H. Wang, Z. Feng, Y. Sui, and S. Chen, Service network: An infrastructure of web services[C], Intelligent Computing and Intelligent Systems,2009. ICIS2009. IEEE InternationalConference on, vol.3, no., pp.303-308,20-22Nov.2009
    [60] J. Gekas, M. Fasli, Service Network Structure Analysis for Web Service Discovery andComposition [C], BASEWEB’06, May8,2006, Hakodate, Japan.
    [61] H. E.,M. Song, et al., The Research of Service Network Based on Complex Network[C],2010International Conference on Service Sciences (ICSS), pp.203-207,13-14May2010.
    [62] C. J. Date, An introduction to database systems [M]. Addison Wesley Publishing Company,2003.
    [63] S. Colucci, T.D. Noia, A. Ragone, M. Ruta, U. Straccia, and E. Tinelli, Informative top-kRetrieval for Advanced Skill Management [J]. In Proceedings of Semantic Web InformationManagement.2009,449-476.
    [64] I.F. Ilyas, G. Beskales, and M.A. Soliman, A survey of top-query processing techniques inrelational database systems [J]. In Proceedings of ACM Comput. Surv..2008.
    [65] S. Nepal, and M.V. Ramakrishna, Query Processing Issues in Image (Multimedia) Databases[C]. In Proceedings of ICDE.1999,22-29.
    [66] R. Fagin, A. Lotem, and M. Naor, Optimal Aggregation Algorithms for Middleware [C]. InProceedings of PODS.2001.
    [67] U. Güntzer, W. Balke, and W. Kie ling, Optimizing Multi-Feature Queries for ImageDatabases [C]. In Proceedings of VLDB.2000,419-428.
    [68] C. Li, K.C. Chang, I.F. Ilyas, and S. Song, RankSQL: Query Algebra and Optimization forRelational top-k Queries [C]. In Proceedings of SIGMOD Conference.2005,131-142.
    [69] C. Li, M.A. Soliman, K.C. Chang, and I.F. Ilyas, RankSQL: Supporting Ranking Queries inRelational Database Management Systems [C]. In Proceedings of VLDB.2005,1342-1345.
    [70] Y. Luo, X. Lin, W. Wang, and X. Zhou, Spark: top-k keyword query in relational databases[C]. In Proceedings of SIGMOD Conference.2007,115-126.
    [71] Y. Luo, W. Wang, X. Lin, X. Zhou, J. Wang, and K. Li, SPARK2: top-k Keyword Query inRelational Databases [J]. In Proceedings of IEEE Trans. Knowl. Data Eng..2011,1763-1780.
    [72] G. Das, D. Gunopulos, N. Koudas, and D. Tsirogiannis, Answering top-k Queries UsingViews [C]. In Proceedings of VLDB.2006,451-462.
    [73] Q.H. Vu, B.C. Ooi, D. Papadias, and A.K.H. Tung, A graph method for keyword-basedselection of the top-K databases [C]. In Proceedings of SIGMOD Conference.2008,915-926.
    [74] L. Guo, S. Amer-Yahia, R. Ramakrishnan, J. Shanmugasundaram, U. Srivastava, and E. Vee,Efficient top-k processing over query-dependent functions [C]. In Proceedings of PVLDB.2008,1044-1055.
    [75] X. Zhang and J. Chomicki, On the semantics and evaluation of top-k queries in probabilisticdatabases [C]. In Proceedings of ICDE Workshops.2008,556-563.
    [76] C. Li, On contextual ranking queries in databases [J]. In Proceedings of Inf. Syst..2013,509-523.
    [77] K. Schnaitter and N. Polyzotis, Optimal algorithms for evaluating rank joins in databasesystems [J]. In Proceedings of ACM Trans. Database Syst..2010.
    [78] J. Li, B. Saha, and A. Deshpande, A unified approach to ranking in probabilistic databases [C].In Proceedings of VLDB.2011,249-275.
    [79] S. Chaudhuri and G. Das, Keyword querying and Ranking in Databases [J]. In Proceedings ofPVLDB.2009,1658-1659.
    [80] T. Bernecker, H. Kriegel, N. Mamoulis, M. Renz, and A. Züfle, Scalable ProbabilisticSimilarity Ranking in Uncertain Databases [J]. In Proceedings of IEEE Trans. Knowl. DataEng..2010,1234-1246.
    [81] X. Li, Database Clustering Methods [J]. In Proceedings of Encyclopedia of Database Systems.2009,699-700.
    [82] F. Barceló-Rico and J. Díez, Geometrical codification for clustering mixed categorical andnumerical databases [J]. In Proceedings of J. Intell. Inf. Syst..2012,167-185.
    [83] T. Ryu and C.F. Eick, A database clustering methodology and tool [J]. In Proceedings of Inf.Sci..2005,29-59.
    [84] D. Boyd, S. Golder, and G. Lotan, Tweet, Tweet, Retweet: Conversational Aspects ofRetweeting on Twitter [C]. In Proceedings of HICSS.2010,1-10.
    [85] J. Letierce, A. Passant, J. Breslin, S. Decker, Understanding how twitter is used to spreadscientific messages [C]. In: ACM WebSci Conference,2010.
    [86] W. Galuba, K. Aberer, D. Chakraborty, Z. Despotovic, W. Kellerer, Outtweeting the twittererspredicting information cascades in microblogs [C]. In: Proceedings of the3rd conference onOnline social networks, USENIX Association,2010.
    [87] B. Suh, L. Hong, P. Pirolli, E.H. Chi, Want to be retweeted? large scale analytics on factorsimpacting retweet in twitter network [C].2010IEEE Second International Conference onSocial Computing,2010,177–184.
    [88] D. Stern, R. Herbrich, T. Graepel, Matchbox: large scale online bayesian recommendations[C]. In: Proceedings of WWW’09, ACM,2009,111–120.
    [89] T.R. Zaman, R. Herbrich, D. Stern, Predicting information spreading in twitter [J].Computational Social Science and the Wisdom of Crowds,2010,55:1–4.
    [90] Z. Yang, J. Guo, K. Cai, J. Tang, J. Li, L. Zhang, and Z. Su, Understanding retweetingbehaviors in social networks [C]. In Proceedings of CIKM.2010,1633-1636.
    [91] M. Osborne, V. Lavrenko, Rt to win! predicting message propagation in twitter [J].ArtificialIntelligence2011,13:586–589.
    [92] L. Hong, O. Dan, and B.D. Davison, Predicting popular messages in Twitter [C]. InProceedings of WWW (Companion Volume).2011,57-58.
    [93] H. Peng, J. Zhu, D. Piao, R. Yan, Y. Zhang, Retweet modeling using conditional random fields[C].In: ICDM’11Workshops, IEEE,2011,336–343.
    [94] N. Naveed, T. Gottron, J. Kunegis, A.C. Alhadi, Bad news travel fast: A content-basedanalysis of interestingness on twitter [C]. ACM WebSci Conference,2011,1–7.
    [95] D.A. Easley and J.M. Kleinberg, Networks, Crowds, and Markets-Reasoning About a HighlyConnected World [M], New York: Cambridge University Press,2010.
    [96] M. Steiner, B.G. Gaglianello, V.K. Gurbani, V. Hilt, W.D. Roome, M. Scharf, and T. Voith,Network-aware service placement in a distributed cloud environment [C]. In Proceedings ofSIGCOMM.2012,73-74.
    [97] G. Lin, D. Fu, J. Zhu, and G. Dasmalchi, Cloud Computing: IT as a Service [J]. InProceedings of IT Professional.2009,10-13.
    [98] G. Lausen, Relational Databases in RDF: Keys and Foreign Keys [C]. In Proceedings ofSWDB-ODBIS.2007,43-56.
    [99] M. Zhang, M. Hadjieleftheriou, B.C. Ooi, C.M. Procopiuc, and D. Srivastava, OnMulti-Column Foreign Key Discovery [J]. In Proceedings of PVLDB.2010,805-814.
    [100] A. Rostin, O. Albrecht, J. Bauckmann, F. Naumann, and U. Leser, A Machine LearningApproach to Foreign Key Discovery [C]. In Proceedings of WebDB.2009.
    [101] M. Burke, C. Marlow, and T.M. Lento, Social network activity and social well-being [C]. InProceedings of CHI.2010,1909-1912.
    [102]中国互联网络信息中心,中国互联网络发展状况统计报告[EB/OL]. http://www.cnnic.net.cn/research/bgxz/tjbg/,2013.1.
    [103] X. Li, L. Chen. Recommendations based on network analysis [C]. In Proceedings of2011International Conference on Advanced Computer Science and Information Systems,2011,9-15.
    [104] F. Farnoud, O. Milenkovic, and B. Touri, A Novel Distance-Based Approach to ConstrainedRank Aggregation [C]. In Proceedings of CoRR.2012.
    [105] Y. Wang, Y. Huang, X. Pang, M. Lu, M. Xie, and J. Liu, Supervised rank aggregation basedon query similarity for document retrieval [J]. In Proceedings of Soft Comput..2013,421-429.
    [106] J. Weston, R. Kuang, C.S. Leslie, and W.S. Noble, Protein Ranking by Semi-SupervisedNetwork Propagation [J]. In Proceedings of BMC Bioinformatics.2006.
    [107]孙吉贵,刘杰,赵连宇,聚类算法研究[J].软件学报,2008,19(1):48-61.
    [108] P. Li, J.H. Wen, X. Li, SNTClus-A novel service clustering algorithm based on networkanalysis and service tags [J].Przeglad Elektrotechniczny (Electrical Review),89(1),2013.
    [109] D. Devlin and B. O'Sullivan, Preferential Attachment in Constraint Networks [C]. InProceedings of ICTAI.2009,708-715.
    [110] P. Mahendra, M. Prokopenko, and A.Y. Zomaya, Assortative Mixing in Directed BiologicalNetworks [J]. In Proceedings of IEEE/ACM Trans. Comput. Biology Bioinform..2012,66-78.
    [111] C. Zhai and J.D. Lafferty, A study of smoothing methods for language models applied toinformation retrieval [J]. In Proceedings of ACM Trans. Inf. Syst..2004,179-214.
    [112]慕春棣,戴剑彬,叶俊,用于数据挖掘的贝叶斯网络[J].软件学报2000,11(5):660-666.
    [113] D. Gay and M. Boullé, A Bayesian Approach for Classification Rule Mining in QuantitativeDatabases [C]. In Proceedings of ECML/PKDD (2).2012,243-259.
    [114] D. Heckerman, A Tutorial on Learning with Bayesian Networks [C]. In Proceedings ofInnovations in Bayesian Networks.2008,33-82.
    [115] N. Friedman, The Bayesian Structural EM Algorithm [C]. In Proceedings of CoRR.2013.
    [116] S. Har-Peled and S. Mazumdar, On coresets for k-means and k-median clustering [C]. InProceedings of STOC.2004,291-300.
    [117] S. Poomagal and T. Hamsapriya, Cosine similarity-based PageRank calculation [C]. InProceedings of IJWS.2011,142-159.
    [118] E. Achtert, S. Goldhofer, H. Kriegel, E. Schubert, and A. Zimek, Evaluation ofClusterings-Metrics and Visual Support [C]. In Proceedings of ICDE.2012,1285-1288.
    [119] H. Rezanková, T. Loster, and D. Húsek, Evaluation of Categorical Data Clustering [C]. InProceedings of AWIC.2011,173-182.
    [120] W. Cheng, K. Dembczynski, E. Hüllermeier, A. Jaroszewicz, and W. Waegeman,F-Measure Maximization in Topical Classification [C]. In Proceedings of RSCTC.2012,439-446.
    [121] J. Wu, L. Chen, Y. Xie, and Z. Zheng, Titan: a system for effective web service discovery [C].In Proceedings of WWW (Companion Volume).2012,441-444.
    [122]许海玲,吴潇,李晓东,阎保平.互联网推荐系统比较研究[J],软件学报,2009,20(2):350-362.
    [123] F. Ricci, L. Rokach, B. Shapira. Introduction to Recommender Systems Handbook [M].Springer US,2011,1-35.
    [124] Z. Zheng, H. Ma, M.R. Lyu, et al. QoS-Aware Web Service Recommendation byCollaborative Filtering [J]. IEEE Transactions on Services Computing,2011,4(2):140-152.
    [125] L. Kuang, Y. Xia, Y. Mao. Personalized Services Recommendation Based on Context-AwareQoS Prediction [C]. In Proceedings of ICWS.2012,400-406.
    [126] Z. Yang, B. Wu, J. Chen. A Measure Standard for Ontology-Based Service Recommendation[C]. In Proceedings of2011IEEE World Congress on Services.2011,137-144.
    [127]杨志,吴步丹,陈俊亮.基于本体的服务推荐及其衡量标准[J].软件学报,2011,22(2):5262.
    [128] N.N. Chan, W. Gaaloul, S. Tata. Context-Based Service Recommendation for AssistingBusiness Process Design [C]. In Proceedings of EC-Web.2011,39-51.
    [129]王立才,孟祥武,张玉洁.上下文感知推荐系统[J].软件学报,2012,23(1):1-20.
    [130] L. Liu, N. Mehandjiev, L. Xu. Using Contextual Information for Service Recommendation[C]. In Proceedings of HICSS.2011,1-9.
    [131] O. Hatzi, G. Batistatos, M. Nikolaidou, and D. Anagnostopoulos, A Specialized SearchEngine for Web Service Discovery [C]. In Proceedings of ICWS.2012,448-455.
    [132] H. Li, W. Li, G. Wang, and X. Peng, Information Retrieval Services Based on LuceneArchitecture [C]. In Proceedings of ICICA.2012,638-645.
    [133] Y. Sun and J. Han, Mining Heterogeneous Information Networks: Principles andMethodologies [J]. In Proceedings of Mining Heterogeneous Information Networks:Principles and Methodologies.2012,3(2):1-159.
    [134] P. Li, L. Chen, X. Li and J.H. Wen. RNRank: Network-based Ranking on Relational Tuples[C],2013Workshop on Understanding Collective Behaviors in Complex Networks, Brisbane,Australia, April14,2013.
    [135] Date, C., An Introduction to Database Systems [M]. Pearson/Addison Wesley, Boston: NewYork, USA,2004.
    [136] N. Bruno, S. Chaudhuri, and L. Gravano, top-k selection queries over relational databases:Mapping strategies and performance evaluation [J]. In Proceedings of ACM Trans. DatabaseSyst..2002,153-187.
    [137] Z. Zhang, S. Hwang, K.C. Chang, M. Wang, C.A. Lang, and Y. Chang, Boolean ranking:querying a database by k-constrained optimization [C]. In Proceedings of SIGMODConference.2006,359-370.
    [138] X. Liu, D. Yang, M. Ye, and W. Lee, U-Skyline: A New Skyline Query for UncertainDatabases [J]. In Proceedings of IEEE Trans. Knowl. Data Eng..2013,945-960.
    [139] R.W. Taylor and R.L. Frank, CODASYL Data-Base Management Systems [J]. InProceedings of ACM Comput. Surv..1976,67-103.
    [140] A. Borodin, G.O. Roberts, J.S. Rosenthal, and P. Tsaparas, Link analysis ranking: algorithms,theory, and experiments [J]. In Proceedings of ACM Trans. Internet Techn..2005,231-297.
    [141] P. Papapetrou, T. Chistiakova, J. Hollmén, V. Kalogeraki, and D. Gunopulos, Findingrepresentative objects using link analysis ranking [C]. In Proceedings of PETRA.2012,6-6.
    [142] J.M. Kleinberg, Authoritative Sources in a Hyperlinked Environment [J]. In Proceedings of J.ACM.1999,604-632.
    [143] R. Soussi, M. Aufaure, and H.B. Zghal, Towards Social Network Extraction Using a GraphDatabase [C]. In Proceedings of DBKDA.2010,28-34.
    [144] IMDb, IMDb database [EB/OL]. http://www.imdb.com/,2013.
    [145] A. Frank, A. Asuncion, UCI Machine Learning Repository [EB/OL][http://archive.ics.uci.edu/ml]. Irvine, CA: University of California, School of Information and Computer Science,2010.
    [146] IMDb,MOVIEmeter [EB/OL].http://pro.imdb.com/moviemeter/,2013.
    [147] J.R. Quinlan, Induction of Decision Trees. In Proceedings of Machine Learning [M].1986,81-106.
    [148] H. Do, A. Kalousis, J. Wang, and A. Woznica, A metric learning perspective of SVM: on therelation of SVM and LMNN [C]. In Proceedings of CoRR.2012.
    [149] X. Tan, Y. Zhang, S. Tang, J. Shao, F. Wu, and Y. Zhuang, Logistic Tensor Regression forClassification [C].In Proceedings of IScIDE.2012,573-581.
    [150] J. Ross Quinlan, C4.5: programs for machine learning [M]. Morgan kaufmann,1993.
    [151] S.V. Kozyrev, Classification by Ensembles of Neural Networks [C]. In Proceedings of CoRR.2012.
    [152] H. Yu, X. Huang, X. Hu et al. A Comparative Study on Data Mining Algorithms forIndividual Credit Risk Evaluation[C],2010Fourth International Conference on. IEEE,2010,35-38.
    [153] S. Unankard, L. Chen, P. Li, S. Wang, Z. Huang, M. Sharaf, and X. Li. On the Prediction ofRe-tweeting Activities in Social Networks-a Report on WISE2012Challenge [C], WISE2012, Cyprus28-30Nov.,2012.
    [154] S. Raychaudhuri, Introduction to Monte Carlo simulation [C]. In Proceedings of WinterSimulation Conference.2008,91-100.
    [155] J.B. Schafer, D. Frankowski, J.L. Herlocker, and S. Sen, Collaborative FilteringRecommender Systems [C]. In Proceedings of the Adaptive Web.2007,291-324.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700