详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
     2.抽取引文上下文的文本信息,利用Labeled-LDA主题模型获得有向、加权引文网络中顶点权值与边权重两个先验概率,改进传统PageRank算法,实现基于引文上下文的文献排序方法(Context-Based Ranking Algorithm, CBRA)研究。
Currently, the time of Big Data is coming, so more and more scientific literature is shown as electronic documents in the Internet, which not only promotes the popularization of literature, but also accelerates the development of scientific research level, as well as achieves the goal of "standing on the shoulder of giants". However, along with these changes, the problem that the good and the bad literature are intermingled in the large amounts of electronic academic documents is becoming more conspicuous. Therefore, we are faced with the new challenges in literature visualization, retrieval, management and application, which have become a hotspot in the research of Bibliometrics and knowledge management.
     This thesis will put the focus on the related methodologies of scientific literature retrieval based on the theory of citation analysis, text mining and information retrieval. So, some methods will be considered in the following part, i.e., Topic Model, Ranking Algorithm, Language Model, and Graph Theory. First of all, a method of domin knowledge visulation is presented. And then, there is a ranking algorithm of scientific literature by analyzing the semantic knowledge of citation context. Finally, a scientific literature retrieval model was implemented. All of these methods have improved by the experiment. So, the main research content includes:
     1. Put forward a new computing method for the citation probability distribution distance based on citation analysis, and then apply it into the visualization of literature knowledge domain.
     2. Extract the text information of citation context, and use the topic model of Labeled-LDA to generate two prior probabilities (vertex weight, edge weight) in the directed and weighted citation network. So a Context-Based Ranking Algorithm (CBRA) was proposed that improving the traditional PageRank algorithm.
     3. Apply the CBRA into the experiment of author authority ranking analysis. For each topic, we can set up the author authority rankings, which will improve the literature rankings. So that the literature ranking is not only based on the network links, but also take consideration of the authority of author.
     4. In accordance with the CRBA, this thesis will improve the traditional information retrieval model which is based on language model. And then, establish a topic based literature retrieval system by system development methods.
     5. Apply the CBRA into passage retrieval and set up the passage retrieval system based on topic, which can improve the accuracy and relevance of literature retrieval.
    [9]Garf ield E. Citation indexes for science. A new dimension in documentation through association of ideas. International Journal of Epidemiology,2006,35:1123-1127.
    [14]Krauze T K,Hillinger C. Citations, references and the growth of scientific literature:A model of dynamic interaction. Journal of the American Society for Information Science,1971,22 (5):333-336.
    [16]Neuner E. Titles in medical articles:What do we know about them?. The Write Stuff,2007,16(4):158-160.
    [17]Rosen-Zvi M, Griffiths T, Steyvers M et al. The author-topic model for authors and documents. Proceedings of the 20th Conference on Uncertainty in Artificial Intelligence,2004:487-494.
    [18]Price D. Statistical studies of networks of scientific papers. Science,1965, 149:510-515.
    [19]Weinstock M. Citation indexes. Encyclopedia of Library and Information Science, 1971,5:16-41.
    [20]Thorne F C. The citation index:Another case of spurious validity. Journal of Clinical Psychology,1977,33(4):1157-1161.
    [21]http://en. wikipedia. org/wiki/Shepard's_Citations
    [23]Gross P L, Gross E M. College Libraries and Chemical Education. Science,1927,66 (1713):385-389.
    [24]Bradford S C. Documentation. London. Crosby Lockwood,1948.
    [25]Bradford S C. Sources of information on specific subjects. Journal of information Science,1985,10(4):173-180.
    [26]Garfield E. Citation indexes in sociological and historical research. American documentation,1963,14(4):289-291.
    [27]Hirsch J E. An index to quantify an individual's scientific research output. Proceedings of the National Academy of Sciences of the United States of America,2005, 102(46):16569-16572.
    [29]Pinski G, Narin F. Citation influence for journal aggregates of scientific publications:Theory, with application to the literature of physics. Information Processing and Management,1976,12(5):297-312.
    [30]Cronin B. The citation process:The role and significance of citations in scientific communication. London:Taylor Graham,1984,1.
    [31]Davis P M. Eigenfactor:Does the principle of repeated improvement result in better estimates than raw citation counts?. Journal of the American Society for Information Science and Technology,2008,59(13):2186-2188.
    [32]Egghe L, Rousseau R. Introduction to Informetrics:Quantitative methods in library, documentation and information science. Elsevier Science Publishers,1990.
    [33]Schubert A, Glanzel W, Thijs B. The weight of author self-citations:A fractional approach to self-citation counting. Scientometrics,2005,67(3):503-514.
    [34]Hyland K. Self-citation and self-reference:Credibility and promotion in academic publication. Journal of the American Society for Information Science and Technology, 2003,54(3):251-259.
    [35]Bonzi S, Snyder H W. Motivations for citation:A comparison of self citation and citation to others. Scientometrics,1991,21(2):245-254.
    [36]http://thomsonreuters.com/products_services/science/science_products/scholar ly_research_analysis/research_evaluation/journal_citation_reports
    [37]Walker D, Xie H, Yan K K et al. Ranking scientific publications using a simple model of network traffic. Journal of Statistical Mechanics:Theory and Experiment,2007 (6):6010.
    [38]Sayyadi H, Getoor L. FutureRank:Ranking scientific articles by predicting their future PageRank. Proceedings ofthe Ninth SIAM International Conference on Data Mining, 2009:533-544.
    [39]Jarvelin K, Persson 0. The DCI Index:Discounted cumulated impact-based research evaluation. Journal of the American Society for Information Science and Technology, 2008,59 (9):1433-1440.
    [40]Larison R R. Bibliometrics of the world wide web:An exploratory analysis of the intellectual structure of cyberspace. Proceedings of the Annual Meeting of the American Society of Information Science Baltimore,1996,33:71-78.
    [41]Almind T C, Ingwersen P. Informetric analysis on the World Wide Web Methodological Approaches to Webometrics. Journal of Documentation,1997,53(4):404-426.
    [43]Page L, Brin S, Motwani R et al. the pagerank citation ranking:bringing order to the web. Technical report, Stanford Digital Library Technologies Project,1998.
    [44]Ma N, Guan J, Zhao Y. Bringing PageRank to the citation analysis. Information processing and management,2008,44(2):800-810.
    [45]Ding Y, Yan E, Frazho A et al. PageRank for ranking authors in co-citation networks. Journal of the American Society for Information Science and Technology,2009,6 (11):2229-2243.
    [47]Kleinberg J. Authoritative sources in a hyperlinked environment. Journal of the ACM,1999,46(5):604-632.
    [48]Chakrabarti S, Dom B, Gibson D et al. Automatic resroucen compilation by analyzing hyperlink structure and associated text. the Seventh International on World Wide Web Conference,1998:14-18.
    [49]Bharat K, MIhaila G A. When experts agree:Using non-affiliated experts to rank popular topics. the Tenth International World Wide Web Conference,2001:597-602.
    [50]Lempel R, Moran S. The stochastic approach for link-structure analysis. ACM Transactions on Information System,2001:131-160.
    [51]Rafiei D, Mendelzon A O. What is this page known for? Computing web page reputations. the Ninth International World Wide Web Conference,2000,30:823-835.
    [52]Richardson M, Domingos P.The intelligent surfer:Probabilistic combination of link and content information in PageRank. Advances in Neural Information Processing Systems,2002:1441-1448.
    [53]Haveliwala T H. Topic-sensitive PageRank:A context-sensitive ranking algorithm for Web search. IEEE Transactions on Knowledge and Data Engineering,2003,15 (4):784-796.
    [54]Blei D M, Ng A Y, Jordan M I. Latent Dirichlet allocation. Journal of Machine Learning Research,2003,3(4-5):993-1022.
    [55]Liu X, Zhang J, Guo C. Full-Text Citation Analysis:Enhancing Bibliometrics and Scientific Publication Ranking. the 21st ACM international conference on Information and knowledge management,2012:1975-1979.
    [56]Ding Y. Topic-based PageRank on author co-citation networks. Journal of the American Society for Information Science and Technology,2011,62(3):449-466.
    [57]Gyongyi Z, Hector, Garcia-Molina et al. Combating Web Spam with TrustRank. the International Conference on Very Large Data Bases,2004,30:576-587.
    [58]http://torrez. us/archives/2005/07/13/tagrank. pdf
    [59]Hotho A, Jaschke, Robert et al. Information Retrieval in Folksonomies:Search and Ranking. The Semantic Web:Research and Applications,2006,40(11):411-426.
    [61]Pazzani M J, Billsus, Daniel. Content-Based Recommendation Systems. The Adaptive Web,2007,4321:325-341.
    [62]Sarwar B, Karypis G, Konstan J et al. Item-based collaborative filtering recommendation algorithms. the 10th international conference on World Wide Web 2001:285-295.
    [63]Basu C, Cohen W W, Hirsh H et al. Technical Paper Recommendation:A Study in Combining Multiple Information Sources. Journal of Artificial Intelligence Research, 2001,14:231-252.
    [64]Chandrasekaran K, Gauch S, Lakkaraju P et al. Concept-Based Document Recommendations for CiteSeer Authors. Proceedings of the 5th international conference on Adaptive Hypermedia and Adaptive Web-Based Systems,2008:83-92.
    [65]Shaparenko B, Joachims T. Identifying the Original Contribution of a Document via Language Modeling. Machine Learning and Knowledge Discovery in Databases,2009, 5782:350-365.
    [66]McNee S, Albert I, Cosley D et al. On the recommending of citations for research papers. Proceedings of the 2002 ACM conference on Computer supported cooperative work, 2002:116-125.
    [67]Zhou D, Zhu S, Yu K et al. Learning multiple graphs for document recommendations. Proceedings of the 17th international conference on World Wide Web,2008:141-150.
    [68]Torres R, McNee S, Abel M et al. Enhancing digitial libraries with techlens. Proceedings of the 4th ACM/IEEE-CS joint conference on Digital libraries,2004: 228-236.
    [69]Strohman T, Croft W B, Jensen D. Recommending citations for academic papers. Proceedings of the 30th annual international ACM SIGIR conference on Research and development in information retrieval,2007:705-706.
    [70]Tang J, Zhang J. A Discriminative Approach to Topic-Based Citation Recommendation. Advances in Knowledge Discovery and Data Mining,2009,5476:572-579.
    [71]He Q, Pei J, Kifer D et al. Context-aware citation recommendation. Proceedings of the 19th international conference on World wide web,2010:421-430.
    [72]He Q, Kifer D, Pei J et al. Citation recommendation without author supervision. Proceedings of the fourth ACM international conference on Web search and data mining, 2011:755-764.
    [82]Jarvelin K, Kekalainen J. Cumulated gain-based evaluation of IR techniques. ACM Transactions on Information Systems,2002,20(4):422-446.
    [86]Hofmann T. Probabilistic latent semantic indexing. Proceedings of the 22nd annual international ACM SIGIR conference on Research and development in information retrieval,1999:50-57.
    [89]Ramage D, Hall D, Nallapati R et al. Labeled LDA:A supervised topic model for credit attribution in multi-labeled corpora. Proceedings of the 2009 Conference on Empirical Methods in Natural Language Processing:Volume 1-Volume 1. Association for Computational Linguistics,2009:248-256.
    [91]Kessler M M. Bibliographic coupling between scientific papers. American Documentation,1963,14:10-25.
    [92]Kessler M M. Bibliographic coupling extended in time:Ten case histories. Information storage and retrieval,1963,1(4):169-187.
    [93]Giles C L, Bollacker K D, Lawrence S. CiteSeer:an automatic citation indexing system. Proceedings of the third ACM conference on Digital libraries,1998:89-98.
    [94]Small H. Co-Citation in Scientific Literature:A new measure of the relationship between two documents. Journal of the American Society for Information Science,1973, 24:265-269.
    [95]Marshakova I V. A system of document connection based on references. Scientific and Technical Information Serial of VINITI,1973,6(2):3-8.
    [97]White H D, Griffith B C. Author Cocitation:A Literature Measure of Intellectual Structure. Journal of the American Society for Information Science,1981,32 (3):163-171.
    [98]Ding Y, Chowdhury G, Foo S. Mapping Intellectual Structure of Information Retrieval: An Author Cocitation Analysis,1987-1997 Journal of Information Science,1999,25 (1):67-78.
    [99]White H D. Pathfinder networks and author cocitation analysis:A remapping of paradigmatic information scientists. Journal of the American Society for Information Science and Technology,2003,54(5):423-434.
    [100]White H D, McCain K W. Visualizing a Discipline:An Author Co-Citation Analysis of Information Science,1972-1995. Journal of the American Society for Information Science and Technology,1998,49(4):327-355.
    [101]Gipp B, Beel J. Citation Proximity Analysis (CPA)-A new approach for identifying related work based on Co-Citation Analysis. Proceedings of the 12th International Conference on Scientometrics and Informetrics,2009,2:571-575.
    [107]Garfield E. Citation Analysis as a Tool in Journal Evaluation-Journals Can Be Ranked by Frequency and Impact of Citations for Science Policy Studies.Science, 1972,178(4060):471-479.
    [108]Callon M, Courtial J P, Turner W A et al. From translations to problematic networks:An introduction to co-word analysis. Social Science Information,1983,22 (2):191-235.
    [109]White H D, McCain K W. Visualizing a discipline:An author co-citation analysis of information science,1972-1995. Jornal of American Society and Information Science Technology,1998,49(4):327-355.
    [110]Morel C M, Serruya S J, Penna G 0 et al. Co-authorship Network Analysis:A Powerful Tool for Strategic Planning of Research, Development and Capacity Building Programs on Neglected Diseases. PLoS neglected tropical diseases,2009,3(8):e501.
    [111]Mei Q, Zhang D, Zhai C. A general optimization framework for smoothing language models on graph structures. Proceedings of the 31st annual international ACM SIGIR conference on Research and development in information retrieval,2008:611-618.
    [114]http://www. google. com/patents/US7565358
    [117]孙卫琴.精通Hibernate Java对象持久化技术详解.北京:电子工业出版社,2005.
    [118]Bauer C, King G. Hibernate in Action. Manning Inc,2004.
    [120]Evans E. Domain-driven design:tackling complexity in the heart of software. Addison-Wesley Professional,2004.
    [122]Salton G, Allan J, Buckley C. Approaches to passage retrieval in full text information systems. Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval,1993:49-58.
    [123]Liu X, Croft W B. Passage retrieval based on language models, the eleventh international conference on Information and knowledge management,2002:375-382.
    [124]Hearst M A, Plaunt C. Subtopic structuring for full-length document access. Proceedings of the 16th annual international ACM SIGIR conference on Research and development in information retrieval,1993:59-68.
    [125]Moffat A, Sacks-Davis R, Wilkinson R et al. Retrieval of Partial Documents. In Donna Harman, editor. Proceedings of the Second Text Retrieval Conference TREC-2, 1994:181-190.
    [126]Light M, Mann G S, Riloff E et al. Analyses for elucidating current question answering technology. Natural Language Engineering,2001,7(4):325-342.
    [127]Ittycheriah A, Franz M, Zhu W J et al. IBM's Statistical Question Answering System. Proceedings of the 9th Text Retrieval Conference TREC-9,2000:229.
    [128]Lee G G, Seo J, Lee S et al. SiteQ:Engineering High Performance QA System Using Lexico-semantic PatternMatching and Shallow NLP. Proceedings of the Tenth Text REtrieval Conference TREC-2001,2001:442.
    [129]Clarke C LA, Cormack G V, Lynam T R et al. Question Answering by Passage Selection. Advances in Open Domain Question Answering,2006,32:259-283.
    [130]Ponte J M, Croft W B. A Language Modeling Approach to Information Retrieval. Proceedings of the 21st annual international ACM SIGIR conference on Research and development in information retrieval,1998:275-281.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700