基于链接的网络数据分类和链接预测新方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着以因特网为代表的信息技术的迅猛发展,人类社会大步迈入了网络时代。关于网络分析的需求日益增加,网络数据挖掘已成为数据挖掘中的一个重要研究课题。网络数据挖掘旨在从网络数据源中提取隐含的知识,完成实体分类、链接预测、社区发现、实体排序和网络聚类等任务,从而达到分析网络的性质、功能、动态变化和网络之间关系的目的。本文围绕网络数据挖掘领域,针对实体分类和链接预测任务,展开了深入的研究。在对现有的实体分类方法进行详细分析的基础上,重点研究了链接关系在实体分类问题中的作用,并针对不同类型的网络数据提出多种解决方案:针对实体有属性和标签的网络数据,提出基于主成份分析和极限学习机的正则化分类模型;针对实体有属性和标签,但已标记实体数量较少的网络数据,提出结合特征选择和链接过滤的主动协作分类方法;针对实体仅有标签信息的网络数据,提出集成网络拓扑特征和标签分布信息的协作分类框架。此外,本文对链接预测的研究现状进行总结,并针对稀疏链接网络,提出协作链接预测框架,给出两种具体实现算法。在多个公用数据集上的实验表明,本文方法能获得较好的效果。
In recent years, with the rapid development of information technology represented byInternet, human society has entered a network age. The demand for network analysis has keptrising, and network data mining has become a new important research field in data mining,and has been widely applied in numerous domain including document classification, proteinstructure prediction, natural language processing, social network analysis and so on. Networkdata mining aims at extract implicit knowledge from network data source, and performlearning task such as entity classification, link prediction, community discovery, entityranking, and network clustering task, so as to reach the purpose of analysis the nature,function and dynamic change of network, as well as understanding the relationships betweennetworks. As key parts of network data mining, Entity classification and link prediction hasattracted particular attention by researchers and a great deal of work has been done. However,the accuracy of algorithm still needs to be enhanced. Besides, there is little work on sparselabeled and sparse linked network. The thesis selects these problems as its main topic.
    The thesis analyses current entity classification methods as well as the premisehypothesis, suitable network data type, and applied domains of these algorithms in detail onthe first. And then especially studies the role of links on entity classification task, a series ofsolution has been presented to solve entity classification on different types of network data. Inparticular, focus on network whose entities have attribute information, presents newregularization classification model based on principle component analysis and extremelearning machine; focus on network whose entities have attribute information, but few entitiesare labeled in the network, presents a new active collective classification method by combingfeature selection and link filter; focus on network whose entity have only label information,presents an integrate collective classification framework to deal sparse labeled network.Besides, the thesis summarizes the study situation on link prediction, and presents collectivelink prediction framework to deal sparse linked network.
    The detail research results are as follows:
    1. Make a thorough review of research on entity classification and link prediction
    The thesis introduces and summarizes the research tasks of entity classification and linkprediction, and points out problems in current approaches and future research directions.
    2. Focus on network whose entities have attribute information, presents newregularization classification model based on principle component analysis andextreme learning machine.
    The algorithm improves current regularization method in that it not only contains thesmooth constraints of defined function, but also considers the label distribution in thenetwork. It adds two new regularization items, they are respectively intra-class similarregularization item and inter-class different regularization item. In realization, we extendextreme learning machine so that it can be used for semi-supervised problem, and furtherinduce the weight definition of hidden layer, in order that it fits the new function. Experimentresults show that for the case that the ratio of labeled nodes is more than25%, our methodperformed well.
    3. Focus on network whose entities have attribute information, but few entities arelabeled in the network, presents a new active collective classification method bycombing feature selection and link filter.To improve the classification accuracies of collective classification methods, we advancethem so that attribute information and link information can be combined the performclassification during the collective inference procedure. This algorithm first uses featureselection to find important features and then constructs links according to attribute similarity;then it analyses original links in network, and selects useful links; finally algorithm combinestwo kinds of links to collective classify nodes. Experiments show that our method can handlesparse problem very well.
    4. Focus on network whose entities have only label information, presents a collectiveclassification framework to deal sparse labeled network.
    The framework divides the attributes of node into two categories, that is structureattribute and label attribute. Algorithm uses different attributes in different stage, andintegrates them to perform classification together. Based on this framework, we present a newclassifier which is called Laplacian classifier based on the structure attributes of nodes, andalso present a new classifier based on label distribution, which is named link pattern classifier.We test our approach in comparison with typical collective classification methods, and theresults indicate that our method can perform well than other methods.
    5. Presents collective link prediction framework to deal sparse linked network
    We proposed a collective link prediction framework, which aims at predicting relatedlinks simultaneously, so that it can deal with sparse linked network as well as network whoselinks are dependent with each other. Based on this framework, two new link predictionmethods are presented; they are separately collective resource allocation and collectiverandom walk. We test our methods on several networks, and results indicate that our methodscan obtain higher prediction accuracy, especially for sparse linked case.
    Nowadays, network mining has been interested by many researchers. This thesis studiesentity classification and link prediction problem in network mining, and presents effectivelearning algorithms for different data types. It is of both theoretical and practical significanceof the research on classification and link prediction problem in network data.
引文
[1] Jiawei Han, Micheline Kamber著,范明、孟小峰译.数据挖掘概念与技术[M].机械工业出版社,2007
    [2] Stanley L., Simon K. Predicting protein function from protein/protein interaction data: aprobabilistic approach [J]. Bioinformatics,2003,19(suppl.1): i197–i200.
    [3] Krebs V. An introduction to social network analysis [M]. Retrieved February20,2005.
    [4] Carvalho V., Cohen W.W. On the collective classification of email speech acts[R]. InSpecial Interest Group on Information Retrieval,2005.
    [5] Tyler JR, Wilkinson DM, Huberman BA. Email as spectroscopy: Automated discovery ofcommunity structure within organizations[C]. Proc. of the1st Int’l Conf. on Communitiesand Technologies,2003.
    [6] Newman MEJ. Co-authorship networks and patterns of scientific collaboration [J]. Proc.of the National Academy of Science,2004,101(1):52005205.
    [7] Girvan M, Newman MEJ. Community structure in social and biological networks [J].Proc. of the National Academy of Science,2002,9(12):78217826.
    [8] Palla G, Derenyi I, Farkas I, Vicsek T. Uncovering the overlapping community structuresof complex networks in nature and society [J]. Nature,2005,435(7043):814818.
    [9] Palla G, Barabási AL, Vicsek T. Quantifying social group evolution [J]. Nature,2007,446(7136):664667.
    [10] Wang Z, Zhang J. In search of the biological significance of modular structures in proteinnetworks [J]. PLOS Computational Biology,2007,3(6):10111-1021.
    [11] Spirin V, Mirny LA. Protein complexes and functional modules in molecular networks [J].Proc. of the National Academy of Science,2003,100(21):1212312128.
    [12] Farutin V, Robison K, Lightcap E, Dancik V, Ruttenberg A, Letovsky S, Pradines J.Edge-Count probabilities for the identification of local protein communities and theirorganization [J]. Proteins: Structure, Function, and Bioinformatics,2006,62(3):800818.
    [13] Wilkinson DM, Huberman BA. A method for finding communities of related genes [J].Proc. of the National Academy of Science,2004,101(Suppl.1):52415248.
    [14] Ravasz E, Somera AL, Mongru DA. Hierarchical organization of modularity in metabolicnetworks. Science [J],2002,297(5586):15511555.
    [15] Flake GW, Lawrence S, Giles CL, Coetzee FM. Self-Organization and identification ofWeb communities [J]. IEEE Computer,2002,35(3):6671.
    [16] Li X, Liu B, Yu PS. Discovering overlapping communities of named entities [C]. Proc. ofthe10th European Conf. on Principles and Practice of Knowledge Discovery inDatabases,2006,593600.
    [17] Ino H, Kudo M, Nakamura A. Partitioning of Web graphs by community topology [C].Proc. of the14thInt’l Conf. on World Wide Web. New York: ACM Press,2005.661669.
    [18] Sidiropoulos A, Pallis G, Katsaros D, Stamos K, Vakali A, Manolopoulos Y.Prefetching in content distribution networks via Web communities identification andoutsourcing [J]. World Wide Web,2008,11(1):3970.
    [19] Almeida RB, Almeida VAF. A community-aware search engine [C]. Proc. of the13thInt’l Conf. on World Wide Web. New York: ACM Press,2004.413421.
    [20] Anguelov D., Tasker B., Chatalbashev V., Koller D., Gupta, D. Heitz G., Ng. A.Discriminative learning of Markov Random Fields for segmentation of3d scan data [C].In IEEE Computer Society Conference on Computer Vision and Pattern Recognition,2005.
    [21] Guimera R, Amaral LAN. Functional cartography of complex metabolic networks [J].Nature,2005,433(7028):895900.
    [22] Reichardt J, Bornholdt S. Detecting fuzzy community structures in complex networkswith a Potts model [J]. Physical Review Letters,2004,93(19):218701.
    [23] Garlaschelli D, Loffredo MI. Patterns of link reciprocity in directed networks [J].Physical Review Letters,2004,93(26):268701.
    [24] Yang B, Cheung WK, Liu J. Community mining from signed social networks [J]. IEEETrans. on Knowledge and Data Engineering,2007,19(10):13331348.
    [25] Brandes U, Delling D, Gaertler M, G rke R, Hoefer M, Nikoloski Z, Wagner D. Onmodularity clustering [J]. IEEE Trans. on Knowledge and Data Engineering,2008,20(2):172188.
    [26] Shiga M, Takigawa I, Mamitsuka H. A spectral clustering approach to optimallycombining numerical vectors with a modular network [C]. Proc. of the13th ACMSIGKDD Int’l Conf. on Knowledge Discovery and Data Mining. New York: ACM Press,2007.647656.
    [27] Zhou D, Councill I, Zha H, Giles CL. Discovering temporal communities from socialnetwork documents [C]. Proc. of the7th IEEE Int’l Conf. on Data Mining. New York:IEEE Society,2007.745750.
    [28] Ino H, Kudo M, Nakamura A. Partitioning of Web graphs by community topology [C].Proc. of the14thInt’l Conf. on World Wide Web. New York: ACM Press,2005.661669.
    [29] Almeida RB, Almeida VAF. A community-aware search engine [C]. Proc. of the13thInt’l Conf. on World Wide Web. New York: ACM Press,2004.413421.
    [30] Macskassy S. A, Provost. F. Classification in networked data: A toolkit and a univariatecase study [J]. Journal of Machine Learning Research,2007,8:935-983.
    [31] Sen P, Getoor. L Link-based classification [R]. Technical Report CS-TR-4858,University of Maryland, February2007.
    [32] Getoor L., Diehl. C. P. Link mining: a survey [J]. SIGKDD Explorations: Special Editionon Link Mining,2005,7(2):3–12.
    [33] Jensen D., Neville J., Gallagher B. Why collective inference improves relationalclassification [C]. In KDD’04: Proc. of the tenth ACM SIGKDD Int. Conf. onKnowledge discovery and data mining, New York, USA,2004,593–598.
    [34] Ising E.. Beitrag zur Theorie des Ferromagnetismus [J]. Zeitschrift f. Physik,1925,31:253–258.
    [35] Potts R. B. Some generalized order-disorder transformations [J]. Cambridge PhilosophicSociety,1952,48:106–109.
    [36] Besag J. Spatial interaction and the statistical analysis of lattice systems [J]. Journal of theRoyal Statistical Society,1974,36(2):192–236.
    [37] S. Geman, D. Geman. Stochastic relaxation, Gibbs distributions and the Bayesianrestoration of images [J]. IEEE Transactions on Pattern Analysis and Machine Intelligence(PAMI),1984,6:721–741.
    [38] J. Besag. On the statistical analysis of dirty pictures [J]. Journal of the Royal StatisticalSociety,1986,48(3):259-302.
    [39] S. Chakrabarti, B. Dom, P. Indyk. Enhanced hypertext categorization using hyperlinks
    [C]. In SIGMOD’98: Proc. of the1998ACM SIGMOD Int. Conf. On Management ofdata, New York, USA,1998,307–318.
    [40] B. Taskar, E. Segal, D. Koller. Probabilistic classification and clustering in relational data
    [C]. In Proc. of the Seventeenth Int. Joint Conf. on Artificial Intelligence,2001,870-878.
    [41] B.Taskar, C.Guestrin, D.Koller. Max-margin Markov Networks [C]. In NeuralInformation Processing Systems,2003.
    [42] Q. Lu, L. Getoor. Link-based classification [C]. Proc.12th Int'l Conf. Machine Learning(ICML), AAAI Press,2003,496-503.
    [43] J. Callut, K. Francoisse, M. Saerens, P. Dupont. Semi-supervised classification fromdiscriminative random walks [C]. In Lecture Notes in Artificial Intelligence No.5211,ECML PKDD08,2008,162-177.
    [44] E. Segal, H. Wang, D. Koller. Discovering molecular pathways from protein interactionand gene expression data [J]. Bioinformatics,2003,19:264-272.
    [45] B.Taskar, V.Chatallbashev, D.Koller, C.Guestrin. Learning structured prediction models:a large margin approach [C]. Proc. Of the22ndinternational conference on machinelearning,2005.
    [46] J. D. Lafferty, A. McCallum, F. C. N. Pereira. Conditional random fields: Probabilisticmodels for segmenting and labeling sequence data [C]. In Proceedings of theInternational Conference on Machine Learning., pages282-289,2001.
    [47] L. Chen, M. Wainwright, M. Cetin, A. Willsky. Multi-target multi-sensor data associationusing the tree-reweighted max-product algorithm [C]. In SPIE Aerosense conference,2003.
    [48] C. Castillo, D. Donato, A. Gionis. Know your neighbors: web spam detection using theweb topology [C]. In SIGIR07,2007,423-430.
    [49] J Abernethy, O. Chapelle, C. Castillo. Witch: A new approach to web spam detection [R].Technical report, Yahoo! Research,2008.
    [50] S. A. Macskassy, F. Provost. Suspicion scoring based on guilt-by-association, collectiveinference, and focused data access [C]. In Proceedings of the First InternationalConference on Intelligence Analysis (IA),2005
    [51] K. Tumulty. Inside Bush’s Secret Spy Net [J]. Time,2006,167(21).http://www.time.com/time/archive/preview/0,10987,1194021,00.html.
    [52] P. Domingos, M. Richardson. Mining the network value of customers [C]. In Proceedingsof the Seventh ACM SIGKDD International Conference on Knowledge Discovery andData Mining,2001,57–66.
    [53] Z. Huang, H. Chen, D. Zeng. Applying associative retrieval techniques to alleviate thesparsity problem in collaborative filtering [J]. ACM Transactions on Information Systems(TOIS),2004,22(1):116–142.
    [54] F. Fouss, L. Yen, A. Pirotte, M. Saerens. An experimental investigation of graph kernelson a collaborative recommendation task [J]. In ICDM '06: Proc. of the6th Int. Conf. onData Mining, Washington, DC, USA,2006,863-868.
    [55] S. Hill, F. Provost, C. Volinsky. Network-based marketing: Identifying likely adopters viaconsumer networks. Statistical Science,2006a,22(2):256–276.
    [56] J.Neville, D.Jensen. Iterative classification in relational data [C]. In AAAI workshop onstatistical relational learning,2000
    [57] D.Jensen, J.Neville. Linkage and autocorrelation cause feature selection bias in relationallearning [C]. In ICML02: proceedings of the nineteenth international conference onmachine learning,2002.
    [58] J.Neville, D.Jensen. Relational dependency networks [J]. Journal of machine learningresearch,8:653-692,2007.
    [59] B.Taskar, P.Abbeel, D.Koller. Discriminative probabilistic models for relational data [C].In proceedings of the Annual conference on Uncertainty in Artificial Intelligence,2002.
    [60] L.Getoor, E.Segal, D.Koller. Probabilistic models of text and link structure for hypertextclassification [C]. In IJCAI workshop on text learning: beyond supervision,2001.
    [61] S. A. Macskassy, F. Provost. A simple relational classifier [C]. In Proceedings of theMulti-Relational Data Mining Workshop at the Ninth ACM SIGKDD InternationalConference on Knowledge Discovery and Data Mining,2003.
    [62] C. Perlich, F. Provost. Distribution-based aggregation for relational learning withidentifier attributes [J]. Machine Learning,2006,62(1/2):65–105.
    [63] C. Perlich, F. Provost. Aggregation-based feature invention and relational concept classes
    [C]. In Proceedings of the Ninth ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining,2003,167–176.
    [64] J.Neville, D.Jensen. Relational dependency networks. Journal of machine learningresearch,8:653-692,2007.
    [65] J.Neville, D.Jensen. Collective classification with relational dependency networks [C].Proc. Of the2ndmulti-relational data mining workshop,2003.
    [66] K. Murphy, Y. Weiss, M. I. Jordan. Loopy belief propagation for approximate inference:An empirical study [C]. In Proceedings of the Annual Conference on Uncertainty inArtificial Intelligence,1999,467-475.
    [67] Weiss, Y. Comparing the mean field method and belief propagation for approximateinference in MRFs [M]. Manfred Opper and David Saad (eds). MIT Press. Chapter:Advanced Mean Field Methods,2001.
    [68] D. Greig, B. Porteous, A. Seheult. Exact maximum a posteriori estimation for binaryimages [J]. Journal of the Royal Statistical Society,1989,51(2):271–279.
    [69] A. Blum, S. Chawla. Learning from labeled and unlabeled data using graph mincuts [C].In Proceedings of the Eighteenth International Conference on Machine Learning (ICML),2001,19–26.
    [70] X. Zhu, Z. Ghahramani, J. Lafferty. Semi-supervised learning using Gaussian fields andharmonic functions [C]. In Proceedings of the Twentieth International Conference onMachine Learning (ICML),2003,912–919.
    [71] D, Zhou, O. Bousquet, T.N.Lal, J.Weston, B.Scholkopf. Learning with local and globalconsistency [C]. In NIPS,2003.
    [72] R. I. Kondor, J. D. Lafferty. Diffusion kernels on graphs and other discrete input spaces
    [C]. In ICML '02: Proc. of the19th Int. Conf. on Machine Learning,2002,315-322.
    [73] A. Smola, R. Kondor. Kernels and regularization on graphs [C]. In Proceedings of the2003Conference on Computational Learning Theory (COLT) and Kernels Workshop,2003,144-158.
    [74] W. W.Cohen. Stacked sequential learning [C]. In IJCAI,2005,671-676.
    [75] C.Perlich, F.Provost, Distribution based aggregation for relational learning with identifierattributes [J]. Machine Learning,2006,62(1):65-105.
    [76] L.McDowell, K.Gupta, D.Aha. Cautious inference in collective classification [C].AAAI-2007,2007,596-601.
    [77] L. McDowell, K.M.Gupta, D.W.Aha. Case-Based Collective Classification [C]. FLAIRSConference2007:399-404.
    [78] M.Bilgic, L.Getoor. Effective label acquisition for collective classification.14thACMSIGKDD international conference on Knowledge discovery and data mining,2008,43-51.
    [79] C. Desrosiers, G. Karypis. Within-network classification using local structure similarity.ECML/PKDD (1)2009:260-275
    [80] M.Rattigan, M.Maier, D.jensen. Exploring network structure for active inference incollective classification [C]. Seventh IEEE international conference on data miningworkshops,2007.
    [81] S.Macskassy. Improving learning in networked data by combining explicit and minedlinks [C]. In proceeding of the22th conference on Artificial Intelligence,2007,590-595.
    [82] B.Gallagher, H.Tong, T.Eliassi-Rad, C.Faloustsos. Using ghost edges for classification insparsely labeled networks [C].14thACM SIGKDD international conference onknowledge discovery and data mining,2008,256-264.
    [83] S. Adafre and M. De Rijke. Discovering missing links in Wikipedia [C]. In Proceedingsof the Eleventh ACM SIGKDD International Conference on Knowledge Discovery andData Mining Workshop on Link Discovery,2005
    [84] M. Al Hasan, V. Chaoji, S. Salem, M. Zaki. Link prediction using supervised learning
    [C]. In Proceedings of SDM '06: SIAM Data Mining Conference Workshop on LinkAnalysis, Counter-terrorism and Security,2006.
    [85] H. Kashima, N. Abe. A parameterized probabilistic model of network evolution forsupervised link prediction [C]. In Proceedings of the Sixth IEEE International Conferenceon Data Mining,2006.
    [86] A. Popescul, L. H. Ungar. Statistical relational learning for link prediction [C]. InProceedings of the IJCAI Workshop on Learning Statistical Models from Relational Data,2003.
    [87] D. L.Nowell, J. Kleinberg. The link prediction problem for social networks [C]. In Proc.of the12th International Conference on Information and Knowledge Management,2003.
    [88] H.Yu, P.Braun, High-quality binary protein interaction map of the yeast interactomenetwork [J], Science,2008,322:104-110.
    [89] A. Potgieter, K. April, R. Cooke, I. O. Osunmakinde. Temporality in link prediction:Understanding social complexity [J]. Journal of Transactions on EngineeringManagement,2006,11(1):83-96.
    [90] M. Rattigan, D. Jensen. The case for anomalous link discovery [J]. ACM SIGKDDExplorations Newsletter,2005,7(2):41-47.
    [91] J. Schafer, J. Konstan, J. Riedl. E-commerce recommendation applications [J], Data Min.Knowl. Discov.2001,5:115-153.
    [92] S. Zhou, R.J. Mondragón, Accurately modeling the internet topology [J], Phys. Rev. E70,2004,066108.
    [93] S. Carmi, S. Havlin, S. Kirkpatrick, Y. Shavitt, E. Shir, A model of Internet topologyusing k-shell decomposition [C], Proc. Natl. Acad. Sci. USA,2007,104(27):11150-11154.
    [94] P. Holme, M. Huss, Role-similarity based functional prediction in networked systems:application to the yeast proteome [J], J. R. Soc. Interface,2005,2:327-333.
    [95] Z. Huang, D.D. Zeng. A link prediction approach to anomalous email detection [C], in:Proceedings of2006IEEE International Conference on Systems, Man, and Cybernetics,Taipei, Taiwan,2006,1131-1137.
    [96] K. Dasgupta, R. Singh, B. Viswanathan, D. Chakraborty, S. Mukherjea, A.A. Nanavati, A.Joshi, Social ties and their relevance to churn in mobile telecom networks [C], in:Proceedings of the11th International Conference on Extending Database Technology:Advances in Database Technology, ACM Press, New York,2008,668-678.
    [97] J.A. Hanely, B.J. McNeil, The meaning and use of the area under a receiver operatingcharacteristic (ROC) curve [J], Radiology1982,143:29-36.
    [98] S. Geisser. Predictive Inference: An Introduction [M], Chapman and Hall, New York,1993.
    [99] J.L. Herlocker, J.A. Konstann, K. Terveen, J.T. Riedl. Evaluating collaborative filteringrecommender systems [J], ACM Trans. Inf. Syst.2004,22(1):5-53.
    [100] E. A. Leicht, P. Holme, M. E. J. Newman. Vertex similarity in networks [J], Phys. Rev. E73,2006,026120.
    [101] T. Zhou, L. Lü, Y. C. Zhang. Predicting missing links via local information [J]. Eur. Phys.J. B.2009,71(4):623-630.
    [102] L. Lü, C. H. Jin, T. Zhou. Similarity index based on local paths for link prediction ofcomplex networks [J]. Phys. Rev. E80,2009,046122.
    [103] W. Liu, L. Lü. Link prediction based on local random walk [J]. Eur. phys. Lett.89,2010,58007.
    [104] N. Friedman, L. Getoor, D. Koller, A. Pfeffer. Learning probabilistic relationalmodels[C]. Proceedings of the16th International Joint Conference on ArtificialIntelligence, Stockholm, Sweden,1999,1300-1329.
    [105] D. Heckerman, C. Meek, D. Koller, Probabilistic entity-relationship models, PRMS, andplate models [C]. Proceedings of the21st International Conference on Machine Learning,Banff, Canada,2004,55-61.
    [106] K. Yu, W. Chu, S. Yu, V. Tresp, Z. Xu, Stochastic relational models for discriminativelink prediction [C]. Proceedings of Neural Information Precessing Systems, MIT Press,Cambridge, MA,2007,1553–1560.
    [107] R. R. Sarukkai. Link prediction and path analysis using markov chains [C]. In Intl. WorldWide Web Conf. on Computer Networks.2000.
    [108] B. Taskar, M. F. Wong, P. Abbeel, D. Koller. Link prediction in relational data [C].Proceeding of Neural Information Processing Systems,2003,659-666
    [109] M. Bilgic, G. Namata, L. Getoor. Combining collective classification and link prediction
    [C]. In IEEE ICDM Workshop on Mining Graphs and Complex Structures,2007.
    [110] A. Clauset, C. Moore, M. E. J. Newman. Hierarchical structure and the prediction ofmissing links in networks [J]. Nature,2008,453:98-101.
    [111] R. Guimerà, M. Sales-Pardo, Missing and spurious interactions and the reconstruction ofcomplex networks, Proc. Natl. Acad. Sci. USA106(2009)22073.
    [112] J. O’Madadhain, J. Hutchins, P. Smyth. Prediction and ranking algorithms for even-basednetwork data [C]. In Proceeding of the ACM SIGKDD International Conference onKnowledge Discovery and Data Mining,2005.
    [113] S. Hanneke, E. Xing. Discrete temporal models of social networks [C]. In Proceedings ofthe23rd International Conference on Machine Learning Workshop on Statistical NetworkAnalysis,2006.
    [114] B.Wolf, T.Y, J.Saia. A framework for analysis of dynamic social networks [J].Proceedings of the12th ACM SIGKDD International Conference on KnowledgeDiscovery and Data Mining, Philadelphia, PA.2006,523-528.
    [115] Z. Huang, D.K.J. Lin, The time-series link prediction problem with applications incommunication surveillance [J]. INFORMS Journal on Computing,2009,21:286-319.
    [116] Potgieter, A., K. A. April, R. J. E. Cooke, I. O. Osunmakinde. Temporality in linkprediction: Understanding social complexity [J]. Emergence Complexity Organ.2009,11(1):69-83.
    [117] U. Alon. Network motifs: theory and experimental approaches [J]. Nat. Rev. Genet.2007,8:450-462.
    [118] A. Mantrach, L. Yen, J. Callut, K. Francoisse, M. Shimbo, M. Saerens, Thesum-over-paths covariance kernel: a novel covariance measure between nodes of adirected graph [J], IEEE Transactions on Pattern Analysis and Machine. Intelligence,2010,32:1112-1126.
    [119] T. Murata, S. Moriyasu. Link prediction of social networks based on weighted proximitymeasure [C]. Proceedings of the IEEE/WIC/ACM International Conference on WebIntelligence, ACM Press, New York,2007.
    [120] L. Lü, T. Zhou. Link prediction in weighted networks: the role of weak ties [J], Europhys.Lett.,89,2010,18001.
    [121] J. Kunegis, A. Lommatzsch,C. Bauckhage. The slashdot zoo: mining a social networkwith negative edges [C]. Proceedings of WWW’2009, ACM Press, New York,2009.
    [122] R.V. Guha, R. Kumar, P. Raghavan, A. Tomkins. Propagation of trust and distrust [C].Proceedings of WWW’2004, ACM Press, New York,2004.
    [123] J. Leskovec, D. Huttenlocher, J. Kleinberg, Predicting positive and negative links inonline social networks [C]. Proceedings of WWW’2010, ACM Press, New York,2010.
    [124] Z.K.Zhang, C. Liu, Y.C. Zhang, T. Zhou, Solving the cold-start problem in recommendersystems with social tags [J]. Europhys. Lett.92,2010,28002.
    [125] X. Zhu, J. Kandola, Z. Ghahramani, and J. Lafferty. Nonparametric transforms of graphkernels for semi-supervised learning [C]. In Advances in Neural Information ProcessingSystems (NIPS17),2005,1641-1648.
    [126] X. Zhu, Z. Ghahramani, J. D. Lafferty. Semi-supervised learning using Gaussian fieldsand harmonic functions [C]. In ICML '03: Proceedings of the20th internationalconference on Machine learning,2003,912-919.
    [127] M. Belkin, P. Niyogi, V. Sindhwani. Manifold regularization: A geometric framework forlearning from labeled and unlabeled examples [J]. Journal of Machine Learning Research,2006,7:2399-2434.
    [128] X. Zhu, Z. Ghahramani. Learning from labeled and unlabeled data with label propagation
    [R]. Technical Report CMU-CALD-02-107, Carnegie Mellon University,2002.
    [129] Y. Bengio, O. B. Alleau, N. Le Roux. Label propagation and quadratic criterion.[M]. inSemi-Supervised Learning (O. Chapelle, B. Sch lkopf, and A. Zien, eds.), MIT Press,2006,193-216.
    [130] M. Szummer, T. Jaakkola. Partially labeled classification with Markov random walks [J].In Advances in Neural Information Processing Systems,2001,14:945-952.
    [131] T. Joachims. Transductive learning via spectral graph partitioning [C]. Proceedings of theInternational Conference on Machine Learning,2003,290-297.
    [132] O. Chapelle, B. Scholkopf, A. Zien, eds. Semi-Supervised Learning [M]. MIT Press,2006.
    [133] T. Joachims. Transductive inference for text classification using support vectormachines[C]. Proceedings of the International Conference on Machine Learning,1999,200-209
    [134] G. Fung, O. Mangasarian. Semi-supervised support vector machines for unlabeled dataclassification [J]. Optimization Methods and Software,2001,15:29-44.
    [135] H. J. Scudde. Probability of error of some adaptive pattern-recognition machines [J].IEEE Transactions on Information Theory,1965,11:363-371.
    [136] D. Yarowsky. Unsupervised word sense disambiguation rivalling supervised methods[C].Proceedings of the33rd Annual Meeting of the Association for ComputationalLinguistics,1995,189-196.
    [137] Riloff E, Wiebe J, Wilson T. Learning subjective nouns using extraction patternbootstrapping [C]. Proceedings of the Seventh Conference on Natural Language Learning(CoNLL-2003),2003,25-32.
    [138] MCCLOSKY D., CHARNIAK E., JOHNSON M. Effective self-training for parsing [C].In Proceedings of the North American ACL,2006,152-159.
    [139] C. Rosenberg, M. Hebert, H. Schneiderman. Semi-supervised self-training of objectdetection models [J]. Proceedings of the Workshop on Applications of Computer Vision,2005,1:29-36.
    [140] K. Nigam, A. K. McCallum, S. Thrun, T. M. Mitchell. Text classification from labeledand unlabeled documents using EM [J]. Machine Learning,2000,39(2/3):103-134.
    [141] BLUM. A, MITCHELL.T. Combining Labeled and Unlabeled Data with Co-training [C].In Proceedings of the Workshop on Computational Learning Theory,1998,92-100.
    [142] Goldman S, Zhou Y. Enhancing supervised learning with unlabeled data [C]. Proceedingof17th International Conf. on Machine Learning. San Francisco, CA: Morgan Kaufmann,2000,327-334.
    [143] Abney, S. Bootstrapping [C]. In ACL,2002,360–367.
    [144] Zhou Y, Goldman S. Democratic Co-learning [C]. Proceedings of the16th IEEEInternational Conference on Tools with Artificial Intelligence (ICTAI2004),2004,594-602.
    [145] Balcan, M.-F., Blum, A., Yang, K. Co-training and expansion: Towards bridging theoryand practice [C]. In NIPS17,2005,89–96.
    [146] Chawla N V, Karakoulas G. Learning from labeled and unlabeled data: An empiricalstudy across techniques and domains [J]. Journal of Artificial Intelligence Research,2005,23:331-366.
    [147] Wang W., Zhou, Z.-H. Analyzing co-training style algorithms [C]. In ECML,2007,454–465.
    [148] Yu S., Krishnapuram B., Rosales R., Steck H., Rao R. B. Bayesian co-training [C]. InNIPS20,2008,1665–1672.
    [149] Sanei S, Lee T K M. A semi-supervised support vector machine for texturesegmentation[C]. International Conference on Image Processing,2004:1522-4880.
    [150] O. Chapelle, A. Zien. Semi-supervised classification by low density separation [C].Proceedings of the International Conference on Artificial Intelligence and Statistics,2005,57-64.
    [151] L. Xu, D. Schuurmans. Unsupervised and semi-supervised multi-class support vectormachines [C]. In AAAI,2005,904-910.
    [152].O. Chapelle, M. Chi, A. Zien. A continuation method for semi-supervised SVMs [C]. InICML '06: Proceedings of the23rd international conference on Machine learning,2006,185-192.
    [153] R. Collobert, F. Sinz, J. Weston, L. Bottou. Large scale transductive SVMs [J]. Journalof Machine Learning Research,2006,7:1687-1712.
    [154] V. Sindhwani, S. S. Keerthi, O. Chapelle. Deterministic annealing for semi-supervisedkernel machines [C]. In ICML '06: Proceedings of the23rd international conference onMachine learning,2006,841-848.
    [155] O. Chapelle, V. Sindhwani, S. Keerthi. Branch and bound for semi-supervised supportvector machines [J]. In B. Scholkopf, J. Piatt, and T. Hoffman, editors, Advances inNeural Information Processing Systems19,2007.
    [156]廖东平,姜斌,魏玺章,黎湘,庄钊文.一种快速的渐进直推式支持向量机分类学习算法[J].系统工程与电子技术,2007,29(01):87-91.
    [157] Xu, Z., Jin, R., Zhu, J., King, I., Lyu, M., Yang, Z. Adaptive Regularization forTransductive Support Vector Machine [C]. In: Proc. of NIPS,2009,2125–2133.
    [158] G. Li, S. C. H. Hoi, K. Chang. Two-view transductive support vector machines [C]. InSDM,2010,235–244.
    [159] A. Blum, J. Lafferty, M. Rwebangira, R. Reddy. Semi-supervised learning usingrandomized mincuts[C]. In Proceedings of the21st International Conference on MachineLearning,2004,13–20.
    [160] M. Belkin, P. Niyogi. Semi-supervised learning on Riemannian manifolds [J]. MachineLearning,2004,56(1-3):209–239.
    [161] M. Belkin, P. Niyogi, V. Sindhwani. On manifold regularization [C]. In Proceedings ofthe10th International Workshop on Artificial Intelligence and Statistics,2005,17–24.
    [162] M.Fan, N.Gu, H. Qiao, B.Zhang. Sparse regularization for semi-supervised classification[J]. Pattern Recognition,2011,44(8):1777–1784.
    [163] K. Nigam, A. K. McCallum, S. Thrun, T. Mitchell. Text classification from labeled andunlabeled documents using EM [J]. Machine Learning,2000,39(2-3):103-134.
    [164] Lu Q, Getoor L. Link-based Classification using Labeled and Unlabeled Data [C].Proceedings of the20th International Conference on Machine Learning (ICML),2003.
    [165] Kuck H, Carbonetto P, Freitas N D. A Constrained Semi-supervised Learning Approachto Data Association [C]. ECCV,2004,1-12.
    [166]赵悦,穆志纯,李霞丽,潘秀琴.一种基于EM和分类损失的半监督主动DBN学习算法[J].小型微型计算机系统,2007,28(4):656-660.
    [167] Grandvalet Y., Bengio Y. Semi-supervised learning by entropy minimization [C]. InAdvances in Neural Information Processing Systems17,2006,529–536.
    [168] McCallum, A., Pal, C., Druck, G., Wang, X. Multi-conditional learning: Generative/discriminative training for clustering and classification [C]. In Proc. Conf. on A.I.,2006,433-439.
    [169] Bellare, K., Druck, G., McCallum, A. Alternating projections for learning withexpectation constraints [C]. In Proc. of Conf. on Uncertainty in Artificial Intelligence,2009.
    [170] G. Druck, G. Andrew, McCallum. High-Performance Semi-Supervised Learning usingDiscriminatively Constrained Generative Models [C]. Proceedings of the27thInternational Conference on Machine Learning, Haifa, Israel,2010
    [171] Szummer M., Jaakkola T. Kernel expansions with unlabeled examples [J]. In Advances inNeural Information Processing Systems13,2001,626–632.
    [172] A. Fujino, N. Ueda, K. Saito. A hybrid generative/discriminative approach tosemi-supervised classifier design [C]. In Proceedings of the20th National Conference onArtificial Intelligence,2005,764–769.
    [173] Lasserre J. A., Bishop C. M., Minka T. P. Principled hybrids of generative anddiscriminative models [C]. In Conf. on Computer Vision and Pattern Recognition,2006,87-94.
    [174] Suzuki J., Isozaki H. Semi-supervised sequential labeling and segmentation usinggiga-word scale unlabeled data [C]. In Proc. of Meeting of Assoc. for ComputationalLinguistics, pp.665-673,2008.
    [175] Koo, T., Carreras, X., Collins, M. Simple semi-supervised dependency parsing [C]. InProc. of Meeting of Assoc. for Computational Linguistics,2008,595-603.
    [176] M. Balcan, A. Blum. A discriminative model for semi-supervised learning [J]. JACM,2010,57(3):19-46.
    [177] D.Y.Zhou, B. Sch lkopf. Regularization on discrete spaces [J]. Pattern recognition,2005:361-368.
    [178] A.B.Goldberg, X.J.Zhu, S.Wright. Dissimilarity in graph-based semi-supervisedclassification [C].7th international conference on artificial intelligence and statistics,2007.
    [179] A.Saffari, C.Leistner, H.Bischof. Regularized multi-class semi-supervised boosting [C].IEEE conference on computer vision and pattern recognition,2009.
    [180] A.N. Tikhonov. Regularization of incorrectly posed problems [J]. Soviet Mathematics.Doklady,1963,4:1624–1627.
    [181] G. Wahba. Spline models for observational data [M]. Society for Industrial and AppliedMathematics. Philadelphia, Pa,1990.
    [182] T. Evgeniou, M. Pontil, T. Poggio. Regularization Networks and Support VectorMachines [J]. Advances in Computational Mathematics,2000,13(1):1–50.
    [183] M. Bertero. Regularization methods for linear inverse problems [M], in: Inverse Problems,ed. C.G. Talenti, Springer, Berlin,1986.
    [184] V.N. Vapnik. Estimation of Dependences Based on Empirical Data [M]. Springer, Berlin,1982.
    [185] D. Zhou. A regularization framework for learning from graph data [C]. Workshop onstatistical relational learning at international conference on machine learning,2004.
    [186] Pearson k. On lines and planes of closest fit to systems of points in space [J].Philosophical magazine,1901,2(6):559-572.
    [187] Huang G.B, Zhu Q.Y, Siew C.K. Extreme learning machine: a new learning scheme offeed forward neural networks [C]. IEEE International Joint Conference on NeuralNetworks,2004,2:985-990.
    [188] Huang G.B., H.A. Babri, Upper bounds on the number of hidden neurons in feedforwardnetworks with arbitrary bounded nonlinear activation functions. Neural Networks [J],IEEE Transactions on,1998.9(1):224-229.
    [189] Huang G.B., Learning capability and storage capacity of two-hidden-layer feedforwardnetworks [J]. Neural Networks, IEEE Transactions on,2003,14(2):274-281.
    [190] Huang G.B., L. Chen, C.K. Siew, Universal approximation using incrementalconstructive feedforward networks with random hidden nodes [J]. Neural Networks,IEEE Transactions on,2006.17(4):879-892.
    [191] F.Fouss, A. Pirotte, J.M. Renders, M.Saerens. Random-walk computation of similaritiesbetween nodes of a graph with application to collaborative recommendation [J].Knowledge and Data Engineering, IEEE Transactions on,2007,19(3):355-369.
    [192] P.Sen, G.M.Namata, M.Bligic, L.Getoor, B.Gallagher, T. Eliassirad, Collectiveclassification in network data [J]. AI magazine,2008,29(3):93-106.
    [193] G.Holmes, A.Donkin, I.H.Witten. Weka: a machine learning workbench [C]. Proc.Second Austrilia and New Zealand conference on intelligenct information systems,2007,6-25.
    [194] Richardson M., Domingos P. Markov logic networks [J]. Machine Learning,2006,62(1/2):19-24.
    [195] H.C.Peng, F.H.Long, C.Ding. Feature selection based on mutual information: criteria ofmax-dependency, max-relevance, and min-redundancy [J]. IEEE Transactions on PatternAnalysis and Machine Intelligence,2005,27(8):1226-1238.
    [196]刘军著.社会网络分析导论[M],社会科学文献出版社,2004..
    [197] Barrat A., Weigt, M. On the properties of small-world network models [J]. The EuropeanPhysical Journal B-Condensed Matter,2000,13(3):547–560.
    [198] Newman M.E.J. Assoratative mixing in networks [J]. Phys. Rev. Lett.,2002,89(20):208701.
    [199] Page Larry. PageRank: Bringing Order to the Web [R]. Stanford Digital Library Project,talk. August18,1997.
    [200] B. Long, X. Wu, Z. M. Zhang, P.S. Yu. Community learning by graph approximation [C].Proceedings of the7th IEEE international conference on data mining,2007,232-241.
    [201] A. Bernstein, S. Clearwater, F. Provost. The relational vector-space model and industryclassification [C]. In Proceedings of the Learning Statistical Models from Relational DataWorkshop at the Nineteenth International Joint Conference on Artificial Intelligence(IJCAI),2003.
    [202] A. Grabowski, N. Kruszewska, R. A. Kosi′nski. Dynamic phenomena and human activityin an artificial society [J]. Physical Review. E78,2008,066110.
    [203] P. Holme, M. Huss. Role similarity based functional prediction in networked systems:application to the yeast proteome [J]. Journal of the royal society interface.2005,2(4):327-333.
    [204] V. Batageli, A. Mrvar. Pajek Datasets. http://vlado.fmf.uni-lj.si/pub/networks/data/default.htm.
    [205] http://www-personal.umich.edu/~mejn/netdata/
    [206] M. Girvan, M. E. J. Newman. Community structure in social and biological networks [C].Proceedings of the National Academy of Science of the United States of America,2002,99(12):7821-7826.
    [207] M. E. J. Newman. Finding community structure in networks using the eigenvectors ofmatrices [J]. Phys. Rev. E,74,036104,2006.
    [208] D. J. Watts, S. H. Strogatz. Collective dynamics of small world networks [J]. Nature,1998,393:440-442.
    [209] A. Clauset, C. Moore, M. E. J. Newman. Hierarchical structure and the prediction ofmissing links in networks [J]. Nature,2008,453:98-101.
    [210] J.Kim, H.S.Shin, K.Shin, M.Lee. Robust algorithm for arrhythmia classification in ECGusing extreme learning machine [J]. Biomedical engineering2009,31(8):1-12.
    [211] V.Malanthi, N.S.Marimuthu, S.Baskar. A comprehensive evaluation of multicategoryclassification methods for fault classification in series compensated transmission line [J].Neural comput&Applic,2010,19:595-600.
    [212] Pearson, K. The problem of the Random Walk [J]. Nature.72,1905.
    [213] L.Lovasz. Random walks on grahs: a survey [J]. Mathematical studies,1993,2:1-46.
    [214]杨剑,王珏,钟宁.流形上的Laplacian半监督回归[J].计算机研究与发展,2007,44(7):1121-1127.
    [215] L.Y. Lv, T. Zhou. Link prediction in complex networks: a survey [J]. Physica A390,2011,1150-1170.
    [216] Jiawei Han, Micheline Kamber. Data mining concepts and techniques,2007.
    [217] B. Senliol, Z.Cataltepe, A.Sonmez. Collective classification with content and link noise.NIPS2009Workshop on Analyzing Networks and Learning with Graphs, December11,2009in Whistler, BC, Canada.
    [218] C.Y.Lin, J.L.Koh, A.L.P.Chen. A better strategy of discovering link-pattern basedcommunities by classical clustering methods. Lecture Notes in Artificial Intelligence: The14th Pacific-Asia Conference on Knowledge Discover and Data Mining, PAKDD2010,Proceedings, LNAI6118,2010.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700