基于引文链的知识元挖掘方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
一篇文献的知识元是隐含的,而且没有统一的标准,怎样定义文献的知识元并有效得提取文献的知识元已日益成为研究者关心的话题,也是进行文本挖掘的一个重要研究方向。本文采用引文关联的方法来提取文献的知识元,使得可以绕过文献这个门槛,而深入文献内部,对文献内容的结构进行评价,从而使对文献的评价由传统的以文献为单位提高到以文献知识元为单位的深度。首先,本文综述了国内外研究现状,分析了引文索引的规律,在此基础上提取出了相关联的文献特征句子,并根据句子相似度计算方法,提取所对应的参考文献中的特征句子,分别存放在数据库的两个表中。其次,根据自定义规则抽取出了特征句子中的三元组,表示成本体,同样分别存放在数据库的另外两个表中。再次,本文提出了一种基于双权重的本体相似度计算方法,用于比较文献知识元和对应的参考文献知识元之间的相似度。接着,按照上述步骤,以具体例子进行了说明,并给出了试验结果。最后,总结了本文的创新工作,分析了本文存在的不足之处,探讨了今后的工作。
     本文的创新工作主要表现在:(1)本文在引文链的基础上,提出了用引文关联的方法来提取并关联文献知识元的思想,改进了原先只能对文献进行评价的不足;(2)本文提出了一种基于双权重的本体相似度计算方法,可以快速、准确地计算出文献知识元和相应的参考文献知识元之间的相似度。
Knowledge units in a literature is implied, and there is no uniform standard, how to define and accurately extract the knowledge units which imply in a literature is becoming increasingly a topic by researchers, it is also an important research direction in doing text mining. This paper adopts the method of citation relevance to extract knowledge elements in literatures base on citation chain, it can bypass the threshold of literature and go deep into internal literature to assess the structure of the contents, so that it can improve the depth in assessing literatures from traditional regarding literatures as unit to regarding knowledge elements of literatures as unit. First this paper summarizes the present situation in this field at home and abroad, analyzes the rule of citation index and extracts the characteristic sentences of related papers, then extracts the characteristic sentences of corresponding references according to sentence similarity calculation, respectively stored in two database tables. Second extracts the triples of the characteristic sentences according to custom rules and expresses them by Ontology, also respectively stored in two database tables. Third this paper presents a method of Ontology similarity calculation based on double weight, for comparing the similarity of knowledge elements in scientific literatures and corresponding references. Fourth gives an illustration by specific examples according to the above-mentioned steps and gives out the results. Last this paper summarizes the innovation work, analyzes the deficiencies in this paper and discusses the future of work.
     The innovation work are mainly about: (1) This paper presents a method of mining and relating knowledge element based on citation chain, improving the deficiencies of evaluating a scientific literature as a unit in the past; (2) This paper presents a method of Ontology similarity calculation based on double weight, it can fast and accurately calculate the similarity of knowledge elements in scientific literatures and corresponding references.
引文
[1] Eugene Garfield. Citation Indexes for Science : A New Dimension in Documentation through Association of Ideas [J].Science,1955:468-471
    [2]耿海英,肖仙桃.国外共引分析研究进展及发展趋势[J].情报杂志,2006(12):68-69
    [3]周云平,孙媛.我国引文分析研究现状与21世纪发展趋势[J].图书情报工作,2001(2):80-82
    [4]王丹.2000—2006年《图书情报工作》引文分析[J].情报科学,2007(11):1650-1654
    [5]吴沛,粟湘,马峥.基于关联规则挖掘的科技论文引文分析——以化学领域科技期刊为例[J].情报学报,2006(6):643-650
    [6]周军,苏新宁.基于数据仓库的引文分析系统研究[J].情报学报,2002(3):290-294
    [7]黄晓斌.计算机引文分析的新发展[J].情报学报,2006(3):354-361
    [8]王孝宁,崔雷. 2001~2006年国际情报学研究的引文分析[J].情报学报,2007(3):399-407
    [9]耿海英,肖仙桃.国外共引分析研究进展及发展趋势[J].情报杂志,2006(12):68-69
    [10] Small H. Paradigms,Citations,and Maps of Science:A Personal History[J]. Journal of the American Society for Information Science and Technology,2003(5):394-399
    [11] Small H,Sweeney E. Clustering the Science Citaiton Index Using Co-Citation:1 A Comparison of Methods[J].Scientometrics,1985:3-6
    [12] White H D,Griffith B C. Author Co-Citation:A Literature Measure of Intellectual Structure[J].Journal of The American Society for Information Science,1981:163-172
    [13] White H D,McCain K W. Visualizing a Discipline:An Author Co-Citation Analysis of Information Science,1972-1995[J].Journal of the American Society for Information Science,1998(4):327-356
    [14]宋丽萍,徐引篪.基于可视化的作者同被引技术的发展[J].情报学报,2005(2):193-198
    [15] Loet Leydesdorff.Clusters and maps of science journals based on bi-connected graphs in Journal Citation Reports[J].Journal of Documentation,2004(4):371-427
    [16] Felix Moya-Anegon.A new technique for building maps of large scientific domains based on the co-citation of classes and categories [J].Scientometrics,2004(1):129-145
    [17] Brett Powley,Robert Dale.High accuracy citation extraction and named entity recognition for a heterogeneous corpus of academic papers[J].IEEE,2007:119-124
    [18]胡明玲,王建涛.引文分析的局限性及其改进[J].图书馆,2000(6):39-42
    [19]迟玉华.科学引文的意义与期刊生产者的职责[J].烟台师范学院学报(自然科学版),1998(2):142-147
    [20]叶新明.引文中伪引和漏引的机制分析[J].中国图书馆学报,1998(1):72-74
    [21] Nils J. Nilsson. Artificial Intelligence,A New Synthesis[M].China machine press,1999:50-72
    [22] Gruber T R.A Translation Approach to Portable ontology Specifications[J]. Knowledge Acquisition,1993(5):199-220
    [23] Borst W N. Construction of Engineering Ontologies for Knowledge Sharing and euse. PhD thesis[J]. University Twente,Enschede,1997:67-72
    [24] Perez A G,Benjamins V R.Overview of Knowledge Sharing and Reuse Components:Ontologies and Problem Solving Methods[A].In Stockholm V R,Benjamins B,Chandrasekaran A,eds. Proceedings of the IJCAI-99 workshop on ntologies andProblem Solving Methods(KRR5),1999:1-15
    [25] Borst W N.Constriction of Engineering Ontologies for Knowledge Sharing and Reuse. PhD thesis[J].University of Twente,Enschede,1997:211-230
    [26] Gruber T R.A Translation Approach to Portable Ontology Specifications[J]. Knowledge Acquisition,1993(5):199-220
    [27] Neches R,Fikes R E,Gruber T R,et al.Enabling Technology for Knowledge Sharing[J]. AI Magazine,1991,12(3):36-56
    [28] Igor Jurisica.Using Ontologies for Knowledge Management:An Information Systems Perspective[J].Annual Conference of the American Society for Information Science,1999,482-496
    [29] Mike Uschold , Michael Graninger.Ontologies : Principles , Methods and Applications[J]. Knowledge Engineering Review,1996,11(2):93-155
    [30] Gruninger M. and Fox.M.S.Methodology for the Design and Evaluation of Ontologies,Workshop on Basic Ontological Issues in Knowledge Sharing[J]. IJCAI-95,Montreal,1995:121-128
    [31] FERNANDEZ. M, GOMEZ-PEREZ A,JURISTO. N. METHONTOLOGY:From Ontological Art Towards Ontological Engineering,AAAI-97 Spring Symposium on Ontological Engineering,Stanford University,March 24-26th,1997:245-250
    [32] M. Uschold.Ontologies Principles , Methods and Applications.Knowledge Engineering Review,Volume 11 Number 2,June 1996:93-155
    [33]陈禹主编《.IDEF建模分析与设计方法》.北京:清华大学出版社,1999:15-55
    [34]李善平,尹奇韦华,胡玉杰,等.本体论研究综述[J].计算机研究与发展,2004(7):2401-9401
    [35] A.Farquhar,R. Fikes,and J. Rice.The Ontolingua Server:A Tool for Collaborative Ontology Construction In Tenth Knowledge Acquisition for Knowledge-BasedSystems Workshop,Banff,Canada,1996:707-728
    [36] Ontosaurus. http://www.isi.edu/isd/LOOM/LOOM-HOME.html
    [37] OntoEdit. http://ontoserver.aifb.uni-karlsruhe.de/ontoedit
    [38]Kmi.OCMLOperationalConceptualModelingLanguage.http://kmi.open.ac.uk/projects/ocml/,2003,2-19
    [39] Ian Horrocks,Peter F.Patel-Schneider,Frank van Harmelen. From SHIQ and DF to OWL:The Making of a WebOntology Language [EB/OL].http://www.w3. org,2004.7
    [40]赵巾帼,徐德智,罗庆云.汉语句子相似度计算方法比对之研究[J].福建电脑,2007(10):51-68
    [41]黄河燕,陈肇雄,张孝飞,等.大规模句子相似度计算方法[J].中文信息学报,2006(z1):47-52
    [42]吕学强,任飞亮,黄志丹,等.句子相似模型和最相似句子查找算法[J].东北大学学报(自然科学版),2003(6):531-534
    [43]杨思春.一种改进的句子相似度计算模型[J].电子科技大学学报,2006(6):956-959
    [44]张海营.本体学习和基于句型规则的自举本体学习方法模型设计[J].图书情报工作,2007(9):117-120
    [45]王婷.本体相似度研究[J].开发研究与设计技术,2007(6):1609-1611
    [46] AnHai Doan,Jayant Madhavan,Pedro Domingos,et al.Learning to map between ontologies on the semantic web. WWW2002,pp 662-673
    [47] I. V. Levenshtein. Binary Codes capable of correcting deletions,insertions,and reversals. Cybernetics and Control Theory,1966,10(8):pp 707-710
    [48] Resnik,Philip.Semantic Similarity in a Taxonomy:An Information-basedMeasure and its Application to Problems of Ambiguity in Natural Language, Journal of Artificial Intelligence,1999,pp 95-130
    [49] Resnik,Philip.Semantic Similarity in a Taxonomy:An Information-based Measure and its Application to Problems of Ambiguity in Natural Language, Journal of Artificial Intelligence,1999,pp 95-130
    [50] Abraham Bernstein, Esther Kaufmann, Christoph Buerki, etal.How Similar Is It? Towards Personalized Similarity Measures in Ontologies. Wirtschaftsinformatik,2005,pp 1347-1366
    [51] Satoshi Sekine,Kiyoshi Sudo,Takano Ogino.Statistical Matching of Two Ontologies[C].In Proceedings of the SIGLEX9:Standerdizing Lexical Resources,Maryland,USA,1999,pp 69-73
    [52] M. Andrea Rodriguez, Max J.Egenhofer. Determining semantic similarity among entity classes from different ontologies.IEEE Transaction on Knowledge and Data Engineering 2003,15(2):pp 442-456
    [53] Marc Ehrig,York Sure.Ontology Mapping-An Integrated Approach. ESWS 2004,pp 76-91
    [54] Sushama Prasad et al.A Tool For Mapping Between Two Ontologies Using Explicit Information[C].In Proceedings of AAMAS 2002 Workshop on Ontologies and Agent System,2002:60-78
    [55] AnHai D.,Jayant M.,Pedro D.,Alon H. Ontology matching:A machine learning approach. In Steffen S,Rudi S,eds. Handbook on Ontologies in Information Systems,Heidelberg,DE:Springer-Verlag,2003,pp 397-416
    [56] A. Tversky. Features of Similarity. Psychological Review,1977,84(4):pp 327- 352
    [57] Jayant Madhavan,Philip A. Bernstein,Erhard Rahm.Generic Schema Matching with Cupid. VLDB 2001,pp 49-58
    [58] Fausto Giunchiglia,Pavel Shvaiko,Mikalai Yatskevich. SMatch:an Algorithm and an Implementation of Semantic Matching.ESWS 2004,pp 61-75
    [59] Sergey Melnik,Hector Garcia-Molina,Erhard Rahm.Similarity Flooding:A Versatile Graph Matching Algorithm and Its Application to Schema Matching. ICDE 2002,pp 117-128
    [60] Hong Hai Do,Erhard Rahm.COMA-A System for Flexible Combination of Schema Matching Approaches. VLDB 2002,pp 610- 621
    [61] Marc Ehrig,Steffen Staab.QOM-Quick Ontology Mapping.International Semantic Web Conference 2004,pp 683-697
    [62] N. F. Noy,M. A.Musen.Anchor- PROMPT:Using NonLocal Context for Semantic Matching.In: Proceedings of the Workshop on Ontologies and Information Sharing at the 17th International Joint Conference on Artificial Intelligence (IJCAI), Seattle,WA,2001,pp63-70
    [63] Jérome Euzenat,Petko Valtchev. Similarity- Based Ontology Alignment in OWL- Lite. ECAI,2004,pp 333-337
    [64]陈杰,蒋祖华.领域本体的概念相似度计算[J].计算机工程与应用,2006(33): 163-163
    [65]王连诚,马强.基于概念权重的本体相似度计算[J].计算机技术与应用进展,2007:638-641
    [66]姜永常.基于知识元的知识仓库构建[J].图书与情报,2005(6):73-74

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700