摘要
大数据环境下的科研合作预测亟需基于海量数据资源来自动学习和发现研究者间的关联性,提高预测效率和效果。首先基于海量数据构建合著网络,并以合著关系表示科研合作;接着基于深度学习的网络表示学习方法(network embedding)学习研究者在所处网络的语境信息,形成每个研究者的稠密、低维向量表示;最后通过向量相似度指标计算研究者间的语义相似度,实现科研合作预测和推荐。在图书情报领域的实验验证了该方法能够提高科研合作预测的准确率和效果,更好地进行关联推荐。该方法从数据科学视角丰富和扩展了基于复杂网络的情报分析方法。
In order to improve the efficiency and effect of predicting research collaborations in a large data environment, correlations among researchers should be learned and discovered automatically from massive datasets. Firstly, the co-authorship network is built from a massive dataset where research collaborations are indicated by co-authorship. Then, the researchers' context in the network is learned by network embedding based on the deep machine learning method, and each researcher's dense, low-dimensional vector is formatted. Finally, the semantic similarities among authors are calculated through the vector similarity indices for research collaboration prediction. Experiments in the field of Library and Information Science verify that the method can improve the accuracy and efficiency of research collaboration prediction. This method enriches and expands the information analysis methods based on complex networks from the perspective of data science.
引文
[1]Guns R,Rousseau R.Recommending research collaborations using link prediction and random forest classifiers[J].Scientometrics,2014,101(2):1461-1473.
[2]张金柱,王小梅,韩涛.文献-作者二分网络中基于路径组合的合著关系预测研究[J].现代图书情报技术,2016(10):42-49.
[3]张金柱,韩涛,王小梅.作者-关键词二分网络中的合著关系预测研究[J].图书情报工作,2016,60(21):74-80.
[4]Zhang J Z.Uncovering mechanisms of co-authorship evolution by multirelations-based link prediction[J].Information Processing&Management,2017,53(1):42-51.
[5]Zhang Q M,Xu X K,Zhu Y X,et al.Measuring multiple evolution mechanisms of complex networks[J].Scientific Reports,2015,5:Article No.10350.
[6]张斌,马费成.科学知识网络中的链路预测研究述评[J].中国图书馆学报,2015,41(3):99-113.
[7]张金柱,胡一鸣.利用链路预测揭示合著网络演化机制[J].情报科学,2017,35(7):75-81.
[8]Zhao J,Miao L L,Yang J,et al.Prediction of links and weights in networks by reliable routes[J].Scientific Reports,2015,5:Article No.12261.
[9]LüL Y,Zhou T.Link prediction in complex networks:A survey[J].Physica A:Statistical Mechanics and its Applications,2011,390(6):1150-1170.
[10]Guns R.Bipartite networks for link prediction:Can they improve prediction performance?[M].Leuven:Int Soc Scientometrics&Informetrics-Issi,2011:249-260.
[11]高曼,陈崚,徐永成.基于投影的二分网络链接预测[J].计算机科学,2016,43(2):118-123,154.
[12]Goldberg Y,Levy O.word2vec explained:deriving Mikolov et al.’S negative-sampling word-embedding method[J].ar Xiv preprint ar Xiv:1402.3722,2014.
[13]陈维政,张岩,李晓明.网络表示学习[J].大数据,2015(3):8-22,7.
[14]Perozzi B,Al-Rfou R,Skiena S.Deep Walk:Online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2014:701-710.
[15]Grover A,Leskovec J.node2vec:Scalable feature learning for networks[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2016:855-864.
[16]Tang J,Qu M,Wang M Z,et al.LINE:Large-scale information network embedding[C]//Proceedings of the 24th International Conference on World Wide Web,Florence,Italy,2015:1067-1077.
[17]Wang D X,Cui P,Zhu W W.Structural deep network embedding[C]//Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.New York:ACM Press,2016:1225-1234.
[18]Zhou T,LüL Y,Zhang Y C.Predicting missing links via local information[J].The European Physical Journal B,2009,71(4):623-630.