构建语义Web中文本体的粗糙概念格方法
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
语义Web本体是支持语义Web实际运行的知识库,它形式地定义了领域内共同认可的知识以及知识之间的关系,具体表示为领域内共同认可的概念以及概念之间的关系。概念格形式化地定义了概念以及概念之间的关系。虽然概念格和本体同为知识库,但前者立足于数学上的严谨,后者则追求应用上的便利。“实际领域→概念格→本体”的开发路线有利于构建出高质量的语义Web本体。
     现有的本体绝大部分都为英文形式,没有功能完整的构建中文本体的工具。现实世界是不确定的,人类对其的认知势必表现为模糊性和粗糙性,粗糙概念即是人类认知的基本形式之一。粗糙概念格是一种表示形式为粗糙概念以及粗糙概念之间关系的知识库。“实际领域→粗糙概念格→中文本体”,进而将之应用于支持更接近现实世界的语义Web的实际运行,是本论文的研究目标与技术路线。形式背景决定了所构建粗糙概念格的格结构的复杂程度,本论文对从数据源中提取的形式背景进行了重点研究。
     首先,提取形式背景。本论文基于实用性的考虑将无结构的中文文本作为形式背景的数据源。为了提取简洁的形式背景,提出了相似词集集合的概念以改进单一词汇所带来的冗余。
     其次,约简形式背景。对概念格对应的形式背景约简的算法加以扩展。
     进而,抽取粗糙形式概念。在约简的形式背景的基础上抽取粗糙形式概念。
     继之,构建粗糙概念格。由于粗糙形式概念与形式概念元组的个数不同,研究扩展概念格的构建算法,实现粗糙概念格的构建。
     然后,转化粗糙概念格生成语义Web中文本体。在研究形式概念分析应用于构建本体的基础上,对粗糙概念格进行处理生成本体原型,并采用本体描述语言对本体进行描述,实现粗糙概念格向语义Web中文本体的转化。
     最后,进行实例验证。以搜狗实验室提供的214篇交通类中文常用文本作为数据源,验证了语义Web中文本体构建的粗糙概念格方法的可行性和实用价值。
Semantic Web ontology is a knowledge base to support the actual running of semantic Web, which formally defines the widely acknowledged knowledge and the relationships between knowledge in a domain. It is concretely expressed as the widely acknowledged concepts and the relationships between them in the domain. Concept lattice formally defined the concepts and the relationships between them. Although both the concept lattice and the ontology are knowledge bases, however, the former stands on mathematical rigorous feature, and the latter pursues the convenience of application. It is beneficial to build high-quality Semantic Web ontology that "practical domain→concept lattice→ontology" as the developing route.
     Most existing available ontologies are in English form, and there is no effective tool with integrated functions. The cognition to real world of human which is uncertain will certainly show fuzziness and roughness, and the rough concept is one of the basic forms of human cognition. Rough concept lattice is a knowledge base which is expressed as rough concepts and the relationships between them. Moreover the "practical domain→rough concept lattice→Chinese ontology" is applied to support the practical operation of the Semantic Web more closely to real-world, which is the research target and technical route of this paper. Formal context determines the complex degree of rough concept lattice, so this paper mainly researches formal context extracted from data source.
     First, extract formal context. This paper will take unstructured Chinese texts as the data source of formal context considering the practicality. To extract a concise formal context, the set of similar word set was proposed to reduce the redundancy caused by a single word.
     Second, reduce formal context. The reduction algorithm of formal context that corresponding to concept lattice is extended.
     Third, extract rough formal concepts. Rough formal concepts are extracted based on formal context reduced.
     Fourth, build rough concept lattice. Because the numbers of tuples of rough formal concept and formal concept are different, to achieve rough concept lattice building, the building algorithms of the concept lattice are researched and extended.
     Fifth, transform rough concept lattice forming semantic Web Chinese ontology. Based on the study of formal concept analysis applied to ontology building, rough concept lattice is processed to form the ontology prototype, and ontology description language is used to describe ontology, so as to transform rough concept lattice to semantic Web Chinese ontology.
     Finally, carry out example verification. The feasibility and practical value of the method of building semantic Web Chinese ontology based on rough concept lattice are verified by using 214 Chinese texts in traffic classes from Sogou Labs as the data source.
引文
[1]T. Berners-Lee, J. Hendler,O. Lassila. The semantic Web [J]. Scientific American,2001, (5): 34-43.
    [2]宋炜,张铭.语义网简明教程[M].北京:高等教育出版社,2004.
    [3]韩婕,向阳.本体构建研究综述[J].计算机应用与软件,2007,24(9):21-23.
    [4]V. Cross, Yi Wenting. Formal concept analysis for ontologies and their annotation files [J]. Fuzzy Systems,2008, (3):2014-2021.
    [5]梁吉业.基于粗糙集与概念格的智能数据分析方法研究[D].北京:中国科学院研究生院,2004.
    [6]曲开社,翟岩慧,梁吉业,等.形式概念分析对粗糙集理论的表示及扩展[J].软件学报,2007,18(9):2174-2181.
    [7]黄美丽,刘宗田.基于形式概念分析的领域本体构建方法研究[J].计算机科学,2006,33(1):210-212.
    [8]齐红.基于形式概念分析的知识发现方法研究[D].长春:吉林大学,2005.
    [9]陈小莉.基于形式概念分析构建本体的方法研究[J].科技信息,2009,(5):60-61.
    [10]张瑞玲,徐红升,沈夏炯.基于FCA的本体原型系统的设计与实现[J].计算机工程与应用,2008,44(19):122-126.
    [11]Bao Shiyi, Zhou Yu, He Shuyan. Nuclear component design ontology building based on ASME codes [A]. Proceedings of the 18th International Conference on Structural Mechanics in Reactor Technology [C],2005:4880-4887.
    [12]O. Marek. Ontology design with formal concept analysis [A]. Proceedings of the 2nd International CLA Workshop [C],2004:111-119.
    [13]H. Haav. A semi-automatic method to ontology design by using FCA [A]. Proceedings of the 2nd International CLA Workshop on Concept Lattices and their Applications [C],2004:13-25.
    [14]P. Cimiano, S. Staab, J. Tane. Deriving concept hierarchies from text by smooth formal concept analysis [A]. Proceedings of the GI Workshop on Lehren Lemem-Wissen-Adaptivitat (LLWA) [C],2003:72-79.
    [15]黄映辉,李冠宇.不精确性:涵义与性质[J].计算机科学,2010,37(4):167-170.
    [16]黄冬梅,朱慧.粗糙形式概念分析在海洋本体构建中的应用[A].2008中国信息技术与应用学术论坛论文集(二)[C],2008:6-7.
    [17]Y.Yao. A comparative study of formal concept analysis and rough set theory in data analysis [A]. Proceedings of the 3rd International Conference on Rough Sets and Current Trends in Computing (RSCTC) [C],2004:59-68.
    [18]Y.Yao. Concept lattices in rough set theory [A]. Proceedings of the 23rd International Meeting of the North American Fuzzy Information Processing Society [C],2004:796-801.
    [19]王虹,张文修.形式概念分析与粗糙集的比较研究[J].计算机工程,2006,32(8):42-44.
    [20]魏玲,祁建军,张文修.概念格与粗糙集的关系研究[J].计算机科学,2006,33(3):18-22.
    [21]戴上平,何田,谢祥明.概念格与粗糙集的数据分析方法研究[J].计算机工程与设计,2008,29(6):1423-1425.
    [22]刘静.粗糙集与概念格的属性约简研究[D].石家庄:河北师范大学,2008.
    [23]杨海峰,张继福.一种新的概念格结构:粗糙概念格[A].第17届计算机科学与技术应用(CACIS)学术会议论文集(上册)[C],2006:212-216.
    [24]杨海峰,张继福.粗糙概念格及构造算法[J].计算机工程与应用,2007,43(24):172-175.
    [25]徐红升.基于形式概念分析的本体构建、合并与展现[D].开封:河南大学,2007.
    [26]李拓.基于概念格的本体模型及其相关运算研究[D].扬州:扬州大学,2008.
    [27]A. Burusco, R. Fuentes-Gonzalez. Concept lattices defined from implication operators [J]. Fuzzy Sets and Systems,2000,114(3):431-436.
    [28]汤新明,马垣.形式概念分析构建面向对象程序的类及类体系[J].计算机应用与软件,2009,26(5):113-116.
    [29]全国信息与文献标准化技术委员会.GB/T 4894-2009信息与文献术语[S].北京:中国标准出版社,2009.
    [30]J. Cardinal, S. Fiorini, G. Joret, et al. An efficient algorithm for partial order production [J]. Computer Science,2010,39(7):2927-2940.
    [31]B. A. Davey, H. A. Priestley. Introduction to lattices and order (second edition) [M]. Cambridge:Cambridge University Press,2002.
    [32]J. B. Nation. Notes on lattice theory [EB/OL]. [2002-10-01]. http://www.math.hawaii.edu /~jb/books.html.
    [33]Ding Weiping, Guan Zhijing, Shi Quan, et al. Research of electronic patient record mining based on rough concept lattice [J]. Intelligent Systems and Applications,2009, (5):1-4.
    [34]韩道军,张磊,沈夏炯,等.形式背景提取初探[J].河南大学学报,2007,37(5):523-526.
    [35]陈龙,范瑞霞,高琪.基于概念的文本表示模型[J].计算机工程与应用,2008,44(20):162-166.
    [36]郭少友.自动分类中的文档表示及其改善方法研究[J].信息技术,2008,32(8):23-25.
    [37]董振东,董强.知网简介[EB/OL]. [1999-01-01]. http://www.keenage.com/html/c_index. html.
    [38]苏伟峰,李绍滋,李堂秋.一个基于概念的中文文本分类模型[J].计算机工程与应用,2002,38(6):193-195.
    [39]欧灵.基于文本分类的本体匹配及其应用研究[D].重庆:重庆大学,2007.
    [40]刘群,李素建.基于知网的词汇语义相似度计算[A].第三届汉语词汇语义学研讨会论文集[C],台北:[出版者不详],2002:1-18.
    [41]A. Gely, L. Nourine, B. Sadi. Enumeration aspects of maximal cliques and bicliques [J]. Discrete Applied Mathematics,2009,157(7):1447-1459.
    [42]李金海,吕跃进.基于概念格的决策形式背景属性约简及规则提取[J].数学的实践与认识,2009,39(7):182-188.
    [43]聂翠平,米据生,郑凤彩.概念格的外延覆盖约简[J].工程数学学报,2009,26(1):113-117.
    [44]智东杰,智慧来,刘宗田.概念格的内涵缩减研究[J].计算机工程与应用,2009,45(1):42-44.
    [45]吕跃进,李金海.概念格属性约简的启发式算法[J].计算机工程与应用,2009,45(2):154-157.
    [46]沈夏炯,韩道军,刘宗田,等.概念格构造算法的改进[J].计算机工程与应用,2004,(24):100-103.
    [47]K. Bertet, S. Guillas, J. M. Ogier. Extensions of Bordat's algorithm for attributes [A]. Fifth International Conference on Concept Lattices and Their Applications [C],2007:1-12.
    [48]R. Godin, H. Mili, G. Mineau, et al. Design of class hierarchies based on concept (Galois) lattices [J]. Theory and Application of Object Systems,1998,4(2),117-134.
    [49]金梁.概念格Chein构造算法的改进[D].开封:河南大学,2008.
    [50]B. Ganter, R. Wille. Formal concept analysis methods for dynamic conceptual graphs [M]. Berlin:Springer Verlag,2001.
    [51]L. Nourine, O. Raynaud. A fast algorithm for building lattices [J]. Information Processing Letters,1999,71(5):199-204.
    [52]D. G. Kourie, S. Obiedkov, B. W. Watson, et al. An incremental algorithm to construct a lattice of set intersections [J]. Science of Computer Programming,2009,74(3):128-142.
    [53]R. Godin. Incremental concept formation algorithm based on Galois (concept) lattice [J]. Computation Intelligence,1995, (11):246-267.
    [54]C. Carpineto, G. Romano. Galois:an order-theoretic approach to conceptual clustering [A]. Proceedings of 10th International Conference on Machine Learning[C],1993:33-40.
    [55]T. B. Ho. An approach to concept formation based on formal concept analysis [J]. IEICE Trans. Information and Systems,1995,78(5):553-559.
    [56]P. Burmeister. Formal concept analysis with ConImp:introduction to the basic features [D]. Darmstatdt:Technische Hochschule Darmstadt,2003.
    [57]S. Elloumi. A multi-level conceptual data reduction approach based in the Lukasiewicz implication [J]. Information Sciences,2004,163(4):253-262.
    [58]于欣丽.术语工作原则与方法(ISO/DIS704)[J].术语标准化与信息技术,1999,(2):4-12.
    [59]沈夏炯.概念格同构生成方法研究及IsoFCA系统实现[D].上海:上海大学,2006.
    [60]Huang Yinghui, Li Guanyu. Imprecise semantic Web ontology model [A]. Proceedings of 2010 International Conference on E-Business and E-Government [C],2009:1426-1429.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700