引文网络的可调优先粘贴模型及其应用
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
引文网络是论文建立相互引证关系的网络。对它的分析是追溯科学发展历史,评价和预测科学发展意义、规模以及趋势的关键手段之一。结构特性的统计和演化模型的建立是引文网络研究的两个重要内容。已有的引文网络演化模型不能全面解释优先粘贴现象、节点老化现象、无尺度特性、睡美人现象和高聚集性这五个结构特性。本文的研究目标就是构建一个能全面解释上述结构特性的引文网络演化模型,并将模型所揭示的引文网络演化规律用于预测引文网络的发展。本文取得的主要成果有:
     1.设计了一个可调优先粘贴模型(APA模型)来描述引文网络。首先本文对引文网络形成的两个主要机制(节点老化机制和边复制机制)进行建模。然后利用解析计算和数值模拟方法,分析了APA模型中上述两个形成机制的参数对网络结构特性的影响,并得到了这两个机制和引文网络五个结构特性的关系,分析结果也说明APA模型能很好的描述引文网络,分别解释这五个结构特性。
     2.构造了一种APA模型参数估计方法来进一步验证APA模型对真实引文网络的合理描述。首先利用模型参数估计方法获得真实引文网络的模型参数,然后利用这些模型参数生成模型的模拟网络,并将所生成的模拟网络和真实引文网络在五个结构特性上进行一致性分析,分析结果进一步表明APA模型能合理描述真实引文网络,全面解释真实引文网络的五个结构特性。最后分析了真实引文网络具有不同模型参数的原因。APA模型对真实引文网络的合理描述能够揭示引文网络的演化规律。
     3.提出了一个基于APA模型的研究热点预测算法。首先根据APA模型所揭示的论文被引用数增长规律,本文提出了一个以论文最新被引用数为依据的预测算法,实验结果表明该算法的研究热点预测准确率高于其他预测算法。然后本文通过排序融合技术进一步验证了只以论文最新被引用数为依据的研究热点预测是合理的。最后本文在论文搜索引擎中加入论文最新被引用数排序,并结合查询扩展技术加快了用户对所指定研究领域的具体研究内容和研究热点信息的认识。
Citation network is a network to build the citation relations between papers. The analysis on citation network is one of the key methods to review the history of science development, to evaluate and predict the value, the scale and the tendency of science development. Two important goals of the analysis are analyzing structural properties and modeling network evolution. Existing models have failed to simultaneously explain following structural properties of citation network: preferential attachment phenomena, node aging phenomena, scale-free, sleepy beauties phenomena and high clustering. This thesis proposes a model of evolving citation network which explains above properties, and applies the evolution rules of citation network indicated by this model to predict the development of citation networks. The main contributions are as follows:
     1. This thesis proposes Adjustable Preferential Attachment Model (APA Model) to describe citation network. Firstly this thesis proposes APA Model for the two major mechanisms of citation network, which are node aging mechanism and edge copying mechanism. The influence of the APA Model parameters of the above two mechanisms to network structure is studied through both analytical analysis and numerical simulation. The relationships between the two process of APA Model and the five structural properties of citation network are also analyzed. The analyzed relationships show that APA Model can describe citation network well and explain the structural properties, respectively.
     2. This thesis presents a parameter estimation method for APA Model to validate the ability of APA Model to rationally describe the real citation network. The consistency between the five structural properties of real citation networks and of simulated networks constructed according to the parameters estimated from real citation networks is analyzed, and the result shows APA Model can rationally describe the real citation network and simultaneously explain the structural properties of real citation network. The reason of the different parameters obtained from different real citation networks is also provided. The rational description of real citation network by APA Model can indicate the evolution rules of citation network.
     3. Based on APA Model, this thesis proposes an algorithm to predict prospective hot research topics. According to the increasing rules of citations simultaneously indicated by APA Model, the probability to obtain new citations of one paper are predicted based on recent citations. Experimental results demonstrate that the new algorithm achieves higher prediction accuracy than other prediction algorithms. Through rank aggregation, it is confirmed that prospective hot research topics can be reliably predicted using only recent citation. Finally, the ranking of recent citations is integrated into a literature search engine with query expansion technology, the search engine can help the users obtain detailed research field and hot research topics of user-specified research field.
引文
[1] Albert R, Barabási A L. Statistical mechanics of complex networks. Reviews of Modern Physics, 2002, 74(1):47-97.
    [2] Newman M E J. The structure and function of complex networks. SIAM Review, 2003, 45(2):167-256.
    [3] Watts D J, Strogatz S H. Collective dynamics of "small-world" networks. Nature, 1998, 393:440-442.
    [4] Barabási A L, Albert R. Emergence of scaling in random networks. Science, 1999, 286:509-512.
    [5] Garfield E. Perspective on citation analysis of scientists. Citation indexing: its theory and application in science, technology, and humanities. Philadelphia: ISI Press, 1979:240-252.
    [6]朱大明.参考文献的主要作用与学术论文的创新性评审.编辑学报, 2004, 16(2):91-92.
    [7]朱大明.参考文献引用的学术评价作用.编辑学报, 2005, 17(5):324-325.
    [8] Price D J S. Networks of scientific papers. Science, 1965, 149(3683):510-515.
    [9] Price D J S. A general theory of bibliometric and other cumulative advantage processes. Journal of the American Society for Information Science, 1976, 27:292-306.
    [10]邱均平.文献信息引证规律和引文分析法.情报理论与实践, 2001, 4(3):236-240.
    [11]岳洪江,刘思峰,梁立明.我国科学论文结构的国际比较分析.情报学报, 2007, 26(1):148-154.
    [12] Redner S. Citation statistics from more than a century of physical review[DB/OL]. Arxiv:physics/0407137(200407). http://arxiv.org/pdf/physics/0407137.
    [13] Redner S. Citation statistics from 110 years of physical review. Physics Today, 2005, 58(6):49-54.
    [14] Hauff C. Utilizing scale-free networks to support the search for scientific publications[D/OL]. (2005). http://wwwhome.cs.utwente.nl/~hauffc/DA_Hauff.pdf
    [15] Hauff C, Nürnberger A. Utilizing scale-free networks to support the search for scientific publications. Proceedings of the Dutch Belgian Workshop in Information Retrieval, 2006:57-64.
    [16] Redner S. How popular is your paper? European Physical Journal B, 1998, 4(2):131-134.
    [17] Van Raan A F J. Sleeping Beauties in science. Scientometrics, 2004, 59(3):467-472
    [18] Klemm K, Eguiluz V M. Highly clustered scale-free networks. Physical Review E, 2002, 65:036123.
    [19]尹丽春.科学学引文网络的结构研究[博士学位论文].大连:大连理工大学, 2006.
    [20] Dorogovtsev S N, Mendes J F F. Evolution of reference networks with aging. Physical Review E, 2000, 62(2):1842-1845.
    [21] Dorogovtsev S N, Mendes J F F. Scaling properties of scale-free evolving networks: continuous approach. Physical Review E, 2001, 63:056125.
    [22] Geller N L, de Cani J S, and Davis R E. Lifetime-citation rates: A mathematical model to compare scientists' work. Journal of the American Society for Information Science, 1981, 32(1):3-15.
    [23] Vazquez A. Statistics of citation networks[DB/OL]. ArXiv: cond-mat/0105031(200105). http://arxiv.org/pdf/cond-mat/0105031.
    [24] Lehmann S, Lautrup B, and Jackson A D. Citation networks in high energy physics. Physical Review E, 2003, 68:026113.
    [25] Garfield E. Premature discovery or delayed recognition--why? Current Contents, 1980, 21:5-10.
    [26] Garfield E. Delayed recognition in scientific discovery: Citation frequency analysis aids the search for case histories. Current Contents, 1989, 23:3-9.
    [27] Garfield E. More delayed recognition. Part 1. Examples from the genetics of color blindness, the entropy of short-term memory, phosphoinositides, and polymer Rheology. Current Contents, 1989, 38:3-8.
    [28] Garfield E. More delayed recognition. Part 2. From inhibin to scanning electron microcopy. Current Contents, 1990, 9:3-9.
    [29] Cole S. Professional standing and the reception of scientific discoveries. American Journal of Sociology, 1970, 76:286-306.
    [30] Burrell Q L. Stochastic modelling of the first-citation distribution. Scientometrics, 2001, 52(1):3-12.
    [31] Burrell Q L. Are“sleeping beauties”to be expected? Scientometrics, 2005, 65(3):381-389.
    [32] Vázquez A. Knowing a network by walking on it: emergence of scaling. Europhysics Letters, 2001, 54:430-435.
    [33] Simkin M V, Roychowdhury V P. A mathematical theory of citing. Journal of the American Society for Information Science and Technology, 2007, 58(11):1661-1673.
    [34] Gl?nzel W, Schoepflin U. A stochastic model for the ageing of scientific literature. Scientometrics, 1994, 30:49-64.
    [35] Simkin M V, Roychowdhury V P. Stochastic Modeling of citation slips. Scientometrics, 2005, 62(3):367-384.
    [36] Simkin M V, Roychowdhury V P. Copied citations create renowned papers? Annals Improbable Res, 2005, 11(1):24-27.
    [37] Simkin M V, Roychowdhury V P. Read before you cite! Complex Systems, 2003, 14(3):269-274.
    [38] Garfield E. How ISI selects journals for coverage: quantitative and qualitative considerations. Current Comments, 1990, 13:185-193.
    [39] Garfield E. How can impact factor be improved? British Medical Journal, 1996, 313:411-413.
    [40] Schoenbach U H, Garfield E. Citation indexes for science. Science, 1956, 123:61-62.
    [41] Garfield E, Sher I H, Torpie R J. The use of citation data in writing the history of science. Philadelphia: Institute for Scientific Information, 1964.
    [42] Kessler M M. Bibliographic coupling between scientific papers. American Documentation, 1963, 14:10-25.
    [43] Small H. Co-citation in the scientific literature: A new measure of the relationship between two documents. Journal of the American Society for Information Science, 1973, 24(4):265-269.
    [44] Small H. Visualizing science by citation mapping. Journal of the American Society for Information Science, 1999, 50(9):799-813.
    [45] Garfield E. From bibliographic coupling to co-citation analysis via algorithmic historio-bibliography[R/OL]. Speech at Drexel Univ., Philadelphia, 2001. http://www.garfield.library.upenn.edu/papers/drexelbelvergriffith92001.pdf.
    [46]侯海燕.基于知识图谱的科学计量学进展研究[博士学位论文].大连:大连理工大学, 2006.
    [47] Earle P, Vickery B. Social science literature use in the UK as indicated by citations. Journal of Documentation, 1969, 25 (2):123-141.
    [48]周广礼,是雅蓓.参考文献对科技期刊编辑评价稿件创新性的作用.中国科技期刊研究, 2002, 13(6):547-549.
    [49]王平.参考文献引用原则的探讨.编辑学, 2004, 16(1):35-36.
    [50]王颖,杨艳荣.参考文献不可小觑.现代情报, 2003, 23(9):176-178.
    [51]侯海燕.国际科学计量学核心期刊知识图谱.中国科技期刊研究, 2006, 17(2):240-243.
    [52]尹丽春,刘则渊.《科学计量学》引文网络的演化研究.中国科技期刊研究, 2006, 17(5):718-722.
    [53]王炼,武夷山.从Scientometrics期刊的自引看科学计量学的学科特点.科学学与科学技术管理, 2006, 27(2):10-13.
    [54] Moed H F, De Bruin R E, and Van Leeuwen TH N. New bibliometric tools for the assessment of national research performance: Database description, overview of indicators and first applications. Scientometrics, 1995, 33(3):381-422.
    [55] Van Raan A F J. Advanced bibliometric methods as quantitative core of peer review based evaluation and foresight exercises. Scientometrics, 1996, 36(3):397-420.
    [56] Gl?nzel W, Garfield E. The myth of delayed recognition. The Scientist, 2004, 18 (11):8-9.
    [57] Gl?nzel W, Schlemmer B, and Thijs B. Better late than never? On the chance to become highly cited only beyond the standard bibliometric time horizon. Scientometrics, 2003, 58(3):571-586.
    [58] Anderla G. The growth of Scientific and Technical Information: A challenge. Washingtong, D.C.: National Science Foundation, Office of Science Information Service, 1974.
    [59] Feitelson D G, Yovel U. Predictive ranking of computer scientists using citeSeer data. Journal of Documentation, 2004, 60(1):44-61.
    [60] Manjunatha J N, Sivaramakrishnan K R, and Raghavendra K P, et al. Citation prediction using time series approach KDD Cup 2003 (task 1). SIGKDD Explorations, 2003, 5(2):152-153.
    [61] Claudia P, Foster J P, and Sofus A M. Predicting citation rates for physics papers: constructing features for an ordered probit model. SIGKDD Explorations, 2003, 5(2):154-155.
    [62] Garfield E. Lifetime citation rates. Current Contents, 1980, 2:5-8
    [63] Jeong H, Néda Z, and Barabasi A L. Measuring preferential attachment in evolving networks. Europhysics Letter, 2003, 61(4):567-572.
    [64]吴金闪,狄增如.从统计物理学看复杂网络研究.物理学进展, 2004, 24(1):18-46.
    [65] Er?ds P, Rényi A. On the evolution of random graphs. Publications Mathematical Institute Hungarian Academy of Science, 1960, 5:17-61.
    [66] Mendel G. Versucheüber Pflanzen-Hybriden (Experiments with plant hybrids). Proceedings of the National History Society of Brunn, Bohemia (now Czech Republic), 1865.
    [67] Kleinberg J M, Kumar R, Raghavan P, et al. The web as a graph: measurements, models, and methods. Proceedings of the 5th International Conference on Cominatorics and Computing, 1999:1-17.
    [68] Kumar R, Raghavan P, Rajalopagan S, et al. Stochastic models for the web graph. Proceedings of the 41st IEEE Symposium on Foundations of Computer Science, 2000: 57-65.
    [69] Dorogovtsev S N, Mendes J F F. Accelerated growth of networks[DB/OL]. ArXiv: cond-mat/0204102(200204). http://arxiv.org/pdf/cond-mat/0204102.
    [70] Krapivsky P L, Redner S. Organization of growing random networks. Physical Review E, 2001, 63(6):066123.
    [71] Krapivsky P L, Redner S. Log-Networks[DB/OL]. ArXiv: cond-mat/0410379(200410). http://arxiv.org/pdf/cond-mat/0410379.
    [72]章忠志.复杂网络的演化模型研究[博士学位论文].大连:大连理工大学, 2006.
    [73] O'Connor M. Writing successfully in science. Chapman & Hall, London , 1995.
    [74]寇忠宝. BBS中的偏好挖掘与网络建模[博士学位论文].北京:清华大学自动化系, 2004.
    [75]王世军.复杂网络建模及分类器网络的研究[博士学位论文].北京:清华大学自动化系, 2006.
    [76]魏凤文,程屹东.当代物理学进展.江西教育出版社, 1997.
    [77] Weinberg S. Gravitation and cosmology: principals and applications of the general theory of relativity. Wiley, New York, 1972.
    [78] HubbleSite[EB/OL]. http://hubblesite.org.
    [79] Perskins D H.高能物理学导论(第4版).世界图书出版公司北京公司, 2003.
    [80]北京正负电子对撞机国家实验室[EB/OL]. http://www.ihep.ac.cn/xuemi/guojia-lab/index.htm.
    [81]俞允强.宇宙学的现状--进展、问题和展望.天文学进展, 2001, 19(2):87-91.
    [82]俞允强. COBE:宇宙学的一块里程碑——2006年诺贝尔物理学奖解读.科技导报, 2006, 24(12):12-14.
    [83] Robert Roy Britt. The big bangs for astronomers in 2005[EB/OL]. (20051220). http://www.space.com/scienceastronomy/051220_astronomy2005.html
    [84] Renda M E, Straccia U. Web metasearch: rank vs. score based rank aggregation methods. Proceeding of the 2003 ACM Symposium on Applied Computing, Melbourne, Florida, USA, 2003:841-846.
    [85] Young H P, Levenglick A. A consistent extension of condorcet’s election principle. SIAM Journal of Applied Mathematics, 1978, 35(2):285-300.
    [86] Fagin R, Kumar R, McCurley K S, et al. Searching the workplace web. Proceedings of the 12th International World Wide Web Conference,2003:366 - 375.
    [87] Bartholdi J, Tovey C A, and Trick M A. Voting schemes for which it can be difficult to tell who won the election. Social Choice and Welfare, 1989, 6(2):157–165.
    [88] Dwork C, Kumar R, Naor M, and Sivakumar D. Rank aggregation methods for the web. Proceedings of the 10th International World Wide Web Conference, China, Hong Kong, 2001:613- 622.
    [89] Langville A N, Meyer C D. Deeper inside PageRank. Internet Mathematics, 2004, 1(3):335-400.
    [90] Zhang Y. The impact of internet-based electronic resources on formal scholarly communication in the area of library and information science: a citation analysis. Journal of Information Science, 1998, 24(4):241-254.
    [91] Cui H, Wen J R, Nie J Y, and Ma W Y. Query expansion by mining user logs. IEEE Transactions on Knowledge and Data Engineering, 2003, 15(4):829-839.
    [92]崔航,文继荣,李敏强.基于用户日志的查询扩展统计模型.软件学报, 2003, 14(9):1593-1599.
    [93] Cui H, Wen J R, Nie J Y, and Ma W Y. Probabilistic query expansion using query logs, Proceeding of the 11th World Wide Web conference, 2002:325-332.
    [94] Baeza-Yates R, Ribeiro-Neto B. Modern information retrieval. Addison-Wesley, 1999.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700