基于Web知识关联挖掘的本体进化研究

英文题名：Research on Web Knowledge Association Mining-based Ontology Evolution
作者：吴一占
论文级别：硕士
学科专业名称：管理科学与工程
中文关键词：本体进化 ; Web挖掘 ; 关联挖掘
英文关键词：Ontology evolution ; Web mining ; Association mining
学位年度：2011
导师：马静
学科代码：1201
学位授予单位：南京航空航天大学
论文提交日期：2011-03-01

摘要

随着语义网概念的提出,作为实现语义网关键的本体技术受到了广泛的关注和研究,如今在知识表示、知识管理、知识共享、知识复用等多方面都有着广泛应用,成为实现信息知识化处理的最有效方法之一。然而,人工维护本体难度过大、成本过高,极大地限制着本体技术向更广阔领域的运用与推广。现有的本体进化技术虽提出了不少解决方案,但是因为弱于挖掘本体中概念间的关联,依旧没有很好地解决问题。
     本文首先依据Web信息获取分析系统的应用需求,结合本体的构成要素,分析了该领域本体进化的目标;并围绕着实现本体进化,对现有Web挖掘技术进行了分析与研究。之后,本文设计了基于Web知识关联挖掘的本体进化方案,将实现的技术方案分解成流程框架与实现算法两部分分别进行讨论。在流程框架设计上,本文提出了本体进化系统与Web信息抽取挖掘系统相互结合,相互推进的流程框架,实现在循环进行的Web信息抽取挖掘的过程中完成本体的进化;在实现算法上,本文结合Web信息特征,以本体学习更准确、更高效为目标,改进和设计了各步骤的具体实现算法。最后,本文依据设计的理论框架开发了原型系统,验证了本文设计的本体进化方案稳定有效,一定程度解决了人工维护本体困难的问题,达到了本文研究的目标。
     本文有如下创新点:1、本文设计了Web知识关联挖掘算法策略,通过该策略处理可以获得实体间的关联关系,从而解决现有本体进化对于处理实体关联不足的问题。2、将Web信息抽取挖掘与本体进化相结合,设计了闭环相互优化的系统框架,同时满足了信息抽取挖掘系统和本体进化两方面的需求。3、针对Web信息抽取挖掘的应用特性,在处理流程上提出了改进算法,采用基于贝叶斯过滤的文本相关性判断算法、引入动态内容识别技术等,使本体进化更高效、更准确。
With the proposed concept of the Semantic Web, as a key of the Semantic Web, ontology technology has been widely concerned and studied with a wide application in the knowledge representation, knowledge management, knowledge sharing and knowledge reuse fields. Ontology has been one of the preferred techniques in information knowledge solution. However, the artificial maintenance of ontology is too difficult and too expensive, which greatly limits a broader use and promotion of ontology technology. Although the current ontology evolution made a lot of solutions, it is weaker than mining the association between concepts of ontology and is still not well to solve the problem.
     Firstly, based on the application requirements of Web information retrieval and analysis system, combined with elements of ontology, the thesis analyzes the target of the evolution of domain ontology and does research on the existing Web mining techniques around the ontology evolution. Secondly, it designs Web knowledge association mining-based ontology evolution programs, divides finished technical solution into two parts, process framework and algorithm for discussion. In the design of process framework, the thesis proposes ontology evolution system and Web information extraction and mining system combined with each other to promote in the process framework and completes ontology evolution in the loop process of the Web information extraction and mining. Binding characteristics of Web information and with the target of more accurate and efficient ontology learning, the thesis improves and designs the specific algorithm in various steps. Finally, according to the designed theoretical framework, it develops a prototype system to verify that the evolution program is steady and effective, solving the difficult problem of artificial maintenance of ontology to some extent and reaching the goal of this study.
     This thesis has the following innovations: Firstly, this study designs algorithm strategy for Web knowledge association mining to obtain the relationship between the entities and solve the problems that the existing ontology evolution is inadequate for dealing with entity association. Secondly, combining the Web information extraction and mining and ontology evolution, it designs the closed-loop optimization system framework to meet the demand for both the information extraction and mining system and ontology evolution. Finally, aiming at the application characteristics of Web information extraction and mining, it proposes improved algorithm in processing flow and evolved ontology evolution more efficient and more accurate with the algorithm about the relevance of the text judgement based on Bayesian filtering and with the introduction of dynamic content recognition.

引文

[1]马静,宋晴晴.基于OWL的领域本体的综合构建与进化[J].情报学报,2007,26(06).
    [2]史忠植,王文杰.人工智能[M].国防工业出版社, 2007.
    [3]高俊杰,邓贵仕.一种OWL本体进化方法[J].计算机应用研究,2009,7(26):2565~2567
    [4] Mani I. Automatically Inducing Ontologies from Corporate[C].roceedings of the 3rd International Workshop on Computational Terminology, Geneva. 2004.
    [5] Srivastava S, Lamadrid J G. Extracting an Ontology from a Document Using Singular Value Decomposition[R]. Association of Computer and Information Science and Engineering Departments at Minority Institutions, 2001.
    [6]蓝永胜,石崇德.基于语言分析技术的本体自动获取方法研究[J]. 2006,50(9):22~25.
    [7]何婷婷,张小鹏.特定领域本体自动构造方法[J].计算机工程, 2007, 33(22): 235~237.
    [8] Maedche A, Staab S. Mining Ontology from Text[C].Proc. of the 12th International Workshop on Knowledge Engineering and Knowledge Management. 2000.
    [9] Lin D, Pantel P. Induction of Semantic Classes from Natural Language Text[C]//Proceedings of the 7th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, San Francisco, CA. 2001: 317~322.
    [10]方卫东,袁华,刘卫红.基于Web挖掘的领域本体自动学习[J].清华大学学报(自然科学版),2005,45(S1).
    [11]成瑜,何洁月.本体驱动的半结构化Web生物数据抽取[J].计算机工程, 2006, 32 (5) : 192~194.
    [12] Berners - Lee T , Hendler J , L assila O. The Semantic Web. Scientific American, 2001, 284 (5): 34~ 43.
    [13] Klein M , Fensel D. Ontoview: Web-based ontology versioning[A]. In: 1st Int l. Semantic Web Cord. Sardinia, Italia, June 2002.
    [14] Raymond Kosala, Hendrik Blockeel. Web Mining Research: A Survey[J]. SKGKDD Explorations, July 2000.
    [15]陈韧,韩永国.基于本体的知识管理研究[A].西南科技大学计算机科学与技术学院学报.
    [16]吴昊,邢桂芬.基于本体的信息集成技术研究[J].计算机应用,2005,25(2):456~458.
    [17] Dean M , Schreiber G , Bechhofer S , van Harmelen F ,Hendler J , Horrocks I , Mc Guinness D. L., Patel Schneider P. F , Andrea Stein L. OWL web ontology language referrence. W3Crecommendation , 2004.
    [18] Ting K. M , Witten I. H. . Issues in stacked generalization[J]. Journal of Artificial Intelligence Research, 1999, 10 : 271~289.
    [19]唐杰,梁邦勇,李涓子,王克宏.语义Web中的本体自动映射[J].计算机学报, 2006,29(11): 1956~1975.
    [20]马文峰,杜小勇.领域本体进化研究[J].图书情报工作,2006,6(50):71~75.
    [21]鲍爱华,姚莉,刘芳,张维明.本体变化管理技术研究综述[J].计算机科学, 2007,34(9): 151~155.
    [22] Stojanovic L.Methods and Tools for Ontoligy Evolution University of Karlsruhe,2004.
    [23] Haase P, Stojanovic L.Consistent Evolution of OWL Ontologies. In:Proceedings of the Second European Semantic Web Conference . Heraklion,Greece,2005.
    [24] Stojanovic L,et al.User-driven Ontology Evolution Management . In:European Conf Knowledge Eng and Management(EK-AW 2002). Springer-Verlag,2002.
    [25] Parsia B,Sirin E,Kalyanpur A. Debugging OWL ontologies. In:14th Intl Conference on World Wide Weh,New York:ACM Press,2005.
    [26] Wang H,Horidge M,Rector A. Debugging OWLDL ontologies:A heuristic In:4th International Semantic Web Conference(ISWC2005). Galway,Ireland:Springer,2005.
    [27] Grau BC,Parsia B,SirinE. Working with multipli ontologies on the semantic web ,In:3th International Semantic Web Conference(ISWC2004). Springer,2004.
    [28] Seidenberg J,Rector A. Web ontology segmentation:Analisis,classification and use .In: 15th International World Wide Web Conference ,Edinburgh,Scotland ,2006.
    [29] Plesser P,Troyer O D.Ontology Change Detetion Using version Log. In: 4th International Sementic Web Conference(ISWC 2005). Galway ,Ireland,2005.
    [30] Flouris G,Plexousakis D,Antonjou G,Evolving Ontology Evolution. In: SOFSEM06,2006.
    [31] Heflin J,Pan Z. A Model Tbeoretic Semantics for Ontology Versioning. In: Third International Semantic Web Conference(ISWC 2004). Springer,2004.
    [32] Noy N F, et al. A Framework for ontology evolution in collaborative environments. In: 5th International Semantic Web Conference(ISWC 2006),Springer,2006.
    [33] Noy N F,Musen M A. The PROMPT suite: Interactive tools for ontology merging and mapping. International Journal of Human-Computer Studies,2003,59(6):983~1024.
    [34] Huang Z,Stuckenschmidt H. Reasoning with multiversion ontologies: a temporal logic approach. In: 4th International Semantic Web Conference(ISWC2005). Springer,2005.
    [35] Huang Z, Harmmelen F V,Teije A T. Reasoning with inconsistent ontologies. In: the InternationalJoint Conference on Artficial Intelligence(UCAI’05),2005.
    [36] Oren Etzioni. The World Wide Web : quagmire or gold mine. Communication of the ACM , 1996,39 (11).
    [37]陈才扣,金远平.基于Web的时间序列模式挖掘[J].计算机应用研究,2000 ;(7).
    [38] L. F. Chien. Text and Log Mining for Web Information Retrieval , Institute of Information Science. Taiwan :Academia Sinica ,2002.
    [39]郑家恒,宋文中. WWW中文信息自动分类方法研究[J].情报学报,2002 ; (5).
    [40] Liyh , Jainak. Classification of Text Documents. the Computer Journal ,1998 ; (8).
    [41]袁曾任.人工神经元网络及其应用[M].清华大学出版社,2000.
    [42]陈福集,杨善林.一种基于SOM的中文Web文档层次聚类方法[J].情报学报,2002;(2).
    [43]刘贵龙,王慧玲,宋柔.矩阵的奇异值分解在文本分类研究中的应用[J].计算机工程, 2002 ;(12).
    [44] S. A. Morris , Z. Wu , G. Yen. A SOM Mapping Technique for Visualizing Documents in a Database. International Joint Conference on Neural Networks ,Washington D. C. USA , 2001; (3).
    [45] M. A Hearst , J . O Pedersen. Visualizing Information Retrieval Results : a Demonstration of the Tilebar Interface. Proceedings of the CHI’96 conference Companion on Human Factors in Computing Systems : Common Ground ,(1996a).
    [46] L. Page , S. Brin , etc. The PageRank Citation Ranking : Bringing Order to the web. Stanford Digital Libraries Working Paper , 1998.
    [47] S. D Kamvar , T. H. Haveliwala , C. D. Manning and G. H. Golub. Extrapolation Methods for Accelerating PageRank Computations. In Proceedings of the Twelfth International World Wide Conference , May 2003.
    [48] Taher H. Haveliwala. Topic - Sensitive PageRank , WWW2002 , May 7 - 11. 2002. Honolulu, Hawaii , USA. ACM.
    [49] Sepandar D. Kanmvar , Taher H. Haveliwala , Christopher D. Manning, Gene H. Golub. Exploiting the Block Structure of the Web for Computing Pagerank. Standford University Technical Report , March 2003.
    [50] J. M. Kleinberg. Authoritative Sources in a Hyperlinked Environment. In Proceedings of the Ninth Annual ACM - SIAM Symposium on Discrete Algorithms ,1998 :25~27.
    [51] S. Chakrabarti , B E. Dom , D. Gibson etc. Mining the Link structure of the World Wide Web, IEEE Computer , 1999;(8).
    [52] MS Chen , J S Park , PS YU. Data Mining for Path Traversal Patterns in a Web environment. InProceedings of the 16th International Conference on Distributed Computing Systems ,May 1996:27~30.
    [53] O. R Zaiane , M. Xin , J . Han. Discovering Web Access Pattern and Trends by applying OLAP and Data Mining Technology on Web Logs. In Proceedings of Advances in Digital Libraries. Sanla Barbara. CA, Apr. 1998.
    [54]宋擒豹,沈钧毅. Web日志的高效多能挖掘算法[J].计算机研究与发展,2001: (3).
    [55] Hearst M A. A utomated discovery of WordNet relations[A]. Fellbaum C, ed. WordNet: An Electronic L exical Database [C]. Cambridge, MA: M IT Press, 1998.131~151.
    [56] A girre E, Ansa O , Hovy E, et al. Enriching very largeontologies using the WWW [A]. Proc 1st Workshop on Ontology Learning OL’2000 [C]. Berlin, Germany: CEUR Workshop, 2000.
    [57] Srikant R, A grawal R. Mining generalized association rules[A]. Proc Very Large Data Base [C]. San F rancisco:Morgan Kaufmann Publishers, 1995: 407~419.
    [58]贝叶斯-百度百科. http://baike.baidu.com/view/77778.htm.
    [59] Mani I. Automatically Inducing Ontologies from Corporate[C]//Proceedings of the 3rd International Workshop on ComputationalTerminology, Geneva. 2004.
    [60]马静,吴一占,刘思峰.基于领域本体的信息抽取模式生成与系统实现[J].情报学报,2008 (2):193~198.
    [61] M aedche A , Staab S. Ontology learning for the semanticweb[J]. IEEE Intelligent S y stem s, 2001,16 (2) :72~79.
    [62] Morin E. Automatic acquisition of semantic relations between terms from technical corpora [A]. Proc 5th Int Congress on Terminology and Know ledge Eng (TKE’99) [C]. Vienna : TermNet, 1999.
    [63] Navigli R, Velardi P, Gangemi A. Ontology learning and its application to automated term inology translation [J]. IEEE Intelligent System s, 2003, 18 (1) :22~31.
    [64] Maedche A , Staab S. Discovering conceptual relations from text [A]. Proc 14th Euro Conf on Artificial Intelligence [C].Amsterdam: IOS Press, 2000.
    [65] Agirre E, Ansa O , Hovy E, et al. Enriching very largeontologies using the WWW [A ]. Proc 1st Workshop on Ontology Learning OL’2000 [C]. Berlin, Germany: CEUR Workshop, 2000.
    [66] FANG Weidong, ZHANG Ling, WANG Yanxuan, et al.Toward a semantic search engine based on ontologies [A].Proc 4th Intl Conf Machine Learning and Cybernetics(ICMLC 2005)[C]. NewYork: IEEE Press, 2005.
    [67] Rodr1'guez M A, Egenhofer M J.Determining semantic similarityamong entity classes from different ontologies[J].IEEE, 2003.
    [68] Guoqian J, Katsuhiko O, Akira E, et al.Context - based ontologybuilding support in clinical domains using formal concept analysis[J].International Journal of Medical Informatics, 2003, 71(1) :71~81.
    [69] TANAKA M, ISH IDA T. Ontology extraction from tables on the Web[C].Proc of the 2005 Symposium on App lications and the Internet.Washington D C: IEEE Computer Society, 2006: 284~290.
    [70] KERR IGAN M. WSMOViz: an ontology visualization approach forWS2MO [C] / /Proc of the Information Visualization. Baltimore: IEEE Computer Society, 2006: 411~416.
    [71] Li SL, et al . Ontology Learning for Chinese Documents Based on SVD and Concept Clustering. Journal of Beijing Institute of Technology ,2003, 12 (Suppl.)
    [72] Omelayenko B. Learning of Ontologies for the Web: the Analysis of Existent Approaches. In: Proceedings of the International Workshop on Web Dynamics. London:[s. n.] ,2001.
    [73] Cimiano P , Hotho A , Staab S. Comparing Conceptual , Divisive and Agglomerative Clustering for Learning Taxonomies from Text . In : Proceedings of 16th European Conference on Artificial Intelligence(ECAI2004) . Valencia : [s. n.] , 2004.
    [74]王琦,唐世渭,杨冬青等.基于DOM的网页主题信息自动提取[C]. NDBC, 2004.
    [75]聂志强.本体自动抽取中的概念相似性分析[J].计算机工程与应用,2007, 43(26): 159~163.
    [76]王放,顾宁,吴国文.基于本体的Web表格信息抽取[J].小型微型计算机系统, 2003, 24 (12) : 2142~2146.
    [77]董慧,余传明.中文本体的自动获取与评估算法分析[J].情报理论与实践, 2005, 28 (4) : 415~418.
    [78]黄伟,金远平.形式概念分析在本体构建中的应用[J].微机发展,2005, 15 (2) : 28~31.
    [79]马峻.一种从线性概念图中自动抽取本体概念的算法[J].计算机工程与应用, 2004, 40 (23): 161~164.
    [80]王海涛,曹存根,高颖.基于领域本体的半结构化文本知识自动获取方法的设计和实现[J] .计算机学报,2005 ,28 (12) : 2010 ~2018.
    [81]何海芸,包云岗,袁春风.领域概念语义关系类型的半自动提取技术[J] .计算机工程, 2005, 31 (18) :68~70.
    [82]黎铭,薛晓冰,周志华.基于多示例学习的中文Web目录页面推荐[J] .软件学报, 2004 ,15 (9) :1328~1335.
    [83]邓志鸿,唐世渭,张铭. Ontology研究综述.北京大学学报(自然科学版)[J],2002, 38(5): 730~738.
    [84]高茂庭,王政欧. ontology及其应用.计算机应用[J]. 2003,23(12):31~33.
    [85]李善平,尹奇斡,付相君.本体论研究综述.计算机研究与发展[J]. 2004,41(7):1041~1052.
    [86]顾芳,曹存根.知识工程中的本体研究现状与存在问题[J].计算机科学. 2004,31(10):1~14.
    [87]周竞涛,王明微. XML+RDF实现web数据基于语义的描述.西北工业大学CAD/CAM国家专业实验室,2003,9.
    [88]论本体与本体语言及其在信息检索领域的应用[J].情报理论与实践,2004.
    [89]秦春秀.词语相似度计算研究[J].情报理论与实践,2007,30: 1106~1108.
    [90]郑丽萍.本体映射的研究[M].山东科技大学硕士学位论文,2005.
    [91]朱丽红,赵燕平. Web挖掘研究综述[J].情报技术,2004,7: 2~5.
    [92]胡东东,孟小峰.一种基于树结构的Web数据自动抽取方法[C]. NDBC, 2004.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700