基于叙词表的领域本体构建方法研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
本体构建的方式可分为两种:一是基于领域专家的手工构建方式;一是基于机器学习的自动/半自动构建方式。前者以人工工作为主体,所构建本体的语义内容依赖于构建者的个人知识,因此这种方式对知识瓶颈问题只是起到了一种缓解作用。而后者是通过机器学习的方式从海量信息中自动获取知识,是从根本上解决本体构建中知识瓶颈问题的重要途径。目前关于本体自动构建方面的研究越来越多,然而本体构建中领域依赖性强、自动化程度低、学习效果不理想等问题尚未得到很好的解决。特别是在中文本体构建方面,国内外对中文本体自动构建的研究非常少。
     因此,本文在对当前本体构建技术及本体学习方法进行深入研究的基础上,提出一种领域本体自动构建的新思路,并重点研究以下几方面内容:
     (1)提出一个基于叙词表的领域本体学习系统模型。该模型将叙词表的本体转换技术以及本体学习中关系获取的技术相结合,利用叙词表的固有优势,弥补了本体学习过程中由于概念及分类关系获取效果不佳的问题,并在此基础上通过对纯文本数据源进行关系学习,获取概念间的非分类关系,使得所构建的领域本体具有更丰富的语义信息。
     (2)设计并实现了基于叙词表的领域本体学习系统。基于叙词表的领域本体学习系统分为叙词表转换模块以及非分类关系学习模块。在叙词表转换模块中,本文总结了一套领域叙词表本体转换的规则,并以此为依据实现了叙词表到领域初始本体的转换。在非分类关系学习模块中,以扩展的关联规则挖掘法为理论依据,利用中文自然语言处理等技术对中文语料库进行关系获取,并将关系学习的结果添加到初始本体之中。
     (3)用该系统构建领域本体并对其进行评价。目前对本体的评价尚未形成标准,本文仅选用复用性、可扩展性、相关关系参照度等几个指标对本体自动构建结果进行评价。
     本文设计并实现的基于叙词表的领域本体学习系统,为中文领域本体的自动构建提供了有价值的参考,且对基于中文本体的语义知识具体应用具有积极意义。
Ontology building methods can be divided into two types: one is a manual construction method based on experts; the other is an automatic construction method based on machine learning. The first one is a manual-work approach. In this way, semantic of the ontology depends on the builder's personal knowledge, so it just plays a role of mitigation to the bottleneck of knowledge. The second one is to obtain the knowledge automatically from the mass information by machine-learning approach. It is a fundamental solution to the bottleneck of knowledge in building Ontology. Nowadays, though more and more organizations do research on Ontology building, low-level automation, experts-dependency and other issues are still unresolved. Particularly in the research of Ontology building in Chinese, the research of Chinese Ontology building is not enough.
     Therefore, based on the researching in the current Ontology building and learning methods, this paper focus on the following aspects:
     1. Bring forward thesaurus-based Ontology automated building system model. This model combines the thesaurus conversion and relations learning technique. It inherent the advantages of thesaurus to make the concept and category relations more exact, then use relations learning technique to extract the non-category relations from document resources, in order to complete the semantic information of the relations between concepts.
     2. Design and develop thesaurus-based ontology automated building system. This system includes thesaurus-conversion module and relation-learning module. In the first module, a set of thesaurus-conversion rules are present as a basis to achieve the conversion of thesaurus to initial Ontology. The second module makes association rule mining as its guidance, uses Chinese natural language processing technique to get the relations from document resources, and add these relations into initial ontology.
     3. Evaluate the Ontology which is built by thesaurus-based ontology automated building system in terms of scalability, reusability and reference degree.
     These researches provide a valuable reference to Chinese Ontology building, as well as make a positive effect to Ontology-based application.
引文
[1]宋炜,张铭.《语义网简明教程》.高等教育出版社
    [2]杨小平,苏丹.《面向语义检索的本体建模与算法研究》.中国信息系统协会年会论文集, 2005年8月
    [3] Neches R, Fikes R E, Gruber T R. Enabling Techonology for Knowledge Sharing. AIMagazine, 1991; (3)
    [4] Studer R, Benjamins V R, Fensel D. Knowledge Engineering, Principles and Methods. Data and Knowledge Engineering, 1998;25(112).
    [5]Genesereth, MR and Fikes, RE, 1992. Knowledge Interchange Format, Version 0.3, Reference Manual. Knowledge Systems Laboratory, Stanford University, KSL-92-86.
    [6]Farquhar, A., Fikes, R. and Rice, J., 1997, The ontolingua server: a tool for collaborative ontology construction, International Journal of Human-Computer Studies, 46(6):707-728.
    [7]D.B.Lenat, R.V.Guha. Buliding Large Knowledge-Based Systems. Reading, MA: Addition-Wesley , 1990.
    [8]MacGregor, R., 1991, Inside the LOOM classifier, SIGART Bulletin, 2(3):70-76.
    [9]E Motta. An overview of the OCM_L modelling language. The 8th Workshop on Knowledge Enginering: Methods & Languages (KEML98), Karlsruhe, Germany, 1998.
    [10]L Farinas. A Herzig. Interference logic= conditional logic + frame axiom. International Journal of Intelligent Systems, 1994, 9(1):119-130.
    [11]J Herin, J Hendler. Searching the web with SH0E. In:Artificial Intelligence for Web Search. Menlo Park, cA:KAAI Press, 2000. 35-40.
    [12]P D Karp, V K Chaudhri, J Thomere. X0L: An XM . hased ontology exchange language. AI Center, SRI International, Tech Rep:559, 1999.
    [13]http://www.w3.org/tr/rdf-schema.
    [14]D.Fensel,et al. OIL in anutshell. The 12th Int’lConf on Knowledge Engineering and Knowledge Management, France,2000.
    [15]I Horrocks , P F Pate-Schneider, F Harmelen. Reviewing the design of DAML +OIL:An ontology language for the semantic Web. In: Proc of the 18th National Co nf on A-rtificial Intelligence, AAAI-2002. Edmonton, Alberta, Canada:AAAI Press, 2002.
    [16]F Hannelen, J Hendler, I Horrocks, et a1. OWL Web Ontology Language Reference. World Wide Web Consortium . http://www. w3. Org/tr/owl-ref. 2004-02-10.
    [17]John Davies, Dieter Fensel, Frank Van Harmelen. Towards The Semantic Web-Ontology-Driven Knowledge Management. West Sussex, England: John Wiley & Sons Ltd, 2003.
    [18]Grninger M, Fox M S. Methodology for the Design and Evaluation of Ontologies. Technical Report, University of Toronto, Canada , April 1995.
    [19]Gómez Pérez A. Knowledge sharing and reuse. The Handbook of Applied Expert Systems, Edited by J. Liebowitz, CRC Press, 1998 (10) : 1-36.
    [20]陈禹六. IDEF建模分析和设计方法.北京:清华大学出版社. 1999.
    [21]黄伟.本体构建与语义集成研究.硕士学位论文,东南大学计算机应用专业,南京. 2005.
    [22]刘凤华,朱欣娟.信息系统领域的本体模型研究.西安工程科技学院学报. 2003(1):53-57.
    [23]李景,苏晓鹭.构建领域本体的方法.计算机与农业. 2003(7):7-10.
    [24]王洪伟,吴家春,蒋馥.基于描述逻辑的本体模型研究.系统工程. 2003(3):101-106.
    [25]Boris Lauser. Ontology Tools. Food and Agriculture Organization(FAO) of the UN, 27th April, 2004.
    [26]B.J.Wielinga, A.Th.Schreiber, J.Wielemaker, J.A.C.Sandberg. From Thesaurus to Ontology[EB/OL].
    [27]Qin J, Paling S. Converting a Controlled Vocabulary into an Ontology: the Case of GEM. Information Research, 2001,16(2) .
    [28]曾新红《中国分类主题词表》的OWL表示及其语义深层揭示研究情报学报2005(2):151-160.
    [29]鲜国建,孟宪学,常春.农业科学叙词表的OWL表示研究;中国农业信息科技创新与学科发展大会论文汇编. 2007.
    [30]张继东,余以胜.利用叙词表构建本体的方法研究现代图书情报技术2006(4).
    [31]薛云,叶东毅,张文德.基于《中国分类主题词表》的领域本体构建研究情报杂志;2007(3):17-20.
    [32]杜小勇,李曼,王珊.本体学习研究综述Journal of Software, Vol.17, No.9, September 2006, pp.1837-1847
    [33]Chen WL, Zhu JB, Yao TS. Automatic learning field words by bootstrapping. In: Proc. of the JSCL. Beijing: Tsinghua University Press, 2003. 67-72 (in Chinese with English abstract).
    [34]Zheng JH, Lu JL. Study of an improved keywords distillation method. Computer Engineering, 2005,31(18):194-196 (in Chinese with English abstract).
    [35]Kavalec M, Svátek V. A study on automated relation labelling in ontology learning. In: Buitelaar P, Cimiano P, Magnini B, eds. Ontology Learning from Text: Methods, Evaluation and Applications. Amsterdam: IOS Press, 2005. http://nb.vse.cz/~svatek/ olp05.pdf
    [36]李锦熙主编.《邮电通信技术主题词表》人民邮电出版社
    [37]杨小平,庄巧娟.《综合电子政务主题词表》的OWL表示研究中国信息系统研究协会年会论文集, 2009年
    [38]夏霙,刘功申,李翔.基于标引信息的网络新概念发现算法微型电脑应用2007年第23卷第1期
    [39]付晓歌.汉语动结式依存结构与特征结构对比分析襄樊学院学报2009年04期
    [40]郭艳华,周昌乐.一种汉语语句依存关系网协动生成方法悠久.杭州电子工业学院学报2000, 20(4):24-43
    [41]荆涛,左万利,孙吉贵,车海燕.中文网页语义标注:由句子到RDF表示计算机研究与发展, 2008, 45(7):1121-1131
    [42]Philipp Cimiano, Steffen Staab and Julien Tane. Automatic Acquisition of Taxonomies from Text:FCA meets NLP.
    [43]仲云云,侯汉清,杜慧平.电子政务主题词表自动构建研究中国图书馆学报2008年第三期.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700