关联课程数据组织及知识管理研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着信息技术和网络技术的不断发展,全球数字信息量每年以1018字节的速度剧增,海量学习资源出现了规范不一、组织异构、缺乏语义关联等问题,严重制约了网络教学资源的管理和知识共享。目前互联网上知识数据大量存储在教学资源文档中,各类文档的粒度和结构不便于知识的加工、整合和管理。
     2006年TimBerners-Lee提出关联数据的概念。该概念的提出为解决上述问题提供了一种新的思路。应用关联数据(LinkedData)技术发布知识数据,是实现数据万维网(Web of Data)最重要的一步;本体是语义Web中实现知识表示、知识推理、知识共享和知识重用的重要技术。使用W3C组织公布的资源描述框架及说明(Resource Description Framework, RDF; RDF Scheme, RDFS)和Web本体语言(VVeb Ontology language, OWL)对文档中出现的概念和概念问的关系做形式化定义,它的全球通用性和机器可读性加强了语义检索和人机协同性。使非结构化的学习资源能够转化成可管理的知识。关联数据的最主要作用是数据整合和赋予语义。同时,大量的实践表明,通过关联数据技术能够为电子学习系统提供富有语义的全新的知识服务。基于以上背景分析可知,本文认为关联课程数据构建、组织以及知识管理是未来电子学习领域研究的重要方向。
     本文的主要思路关注两个部分:(1)关联课程数据组织;(2)关联数据的知识管理。
     在第一部分关联课程数据组织阶段,首先将课程教学资源自动或半自动的转化为RDF数据,接着将RDF知识数据进行提炼和处理,创建知识之间以及本数据集与目前LOD上其他数据集之间的关联,构建成关联课程数据,然后通过OWL本体语言对其进行描述,从而实现关联数据演进成为知识本体。
     在第二部分知识管理阶段,针对语义数据的存储和索引问题以及不同数据集之间的共同引用问题做出了探讨。
     本文从数据转换、关联课程数据构建、知识本体构建、关联数据存储索引、数据集成等方面展开研究。
     (1)提出了将多种类型的教学资源文档转换为RDF数据的方法,其中创新的提出四步法将表格数据转换为RDF数据。首先将表中的列头与LOD关联数据集中的类关联;接着将单元格的值链接到这些类的实例,随后挖掘表中的列之间隐含的语义关系,最后生成语义标注输出。
     (2)针对知识表示的问题,提出了关联课程数据构建方法,其中以计算机微机接口,组成原理等课程为例,构建计算机硬件课程RDF数据集。在此基础上将数据集与其他LOD上的大型相关的数据集如DBpedia等,进行owl:same As关联,形成关联的数据集。接着在关联课程数据构建的基础上,引入知识本体的思想,使用便于知识关联和导航的谓词,添加知识点的前后续认知顺序关系,为关联课程数据增加语义。为上层知识服务平台的应用提供很好的数据基础。
     (3)针对关联数据存储和索引的问题,提出一个关联数据存储索引架构。在存储方面,使用MonetDB存储系统提高查询的性能。在索引方面扩展了垂直划分系统。本文使用五个索引:sIndex(subject索引),pIndex(i胃词索引)和oIndex (object索引)以及value索引和class索引。此外,优化了连接操作排序用来提高查询性能。
     (4)针对不同数据集中实体的共同引用问题,提出了(1)基准线的方法,它利用数据集中明确的owl:same As的关系进行数据集成(合并);(2)在基准线方法的基础上进行了扩展,使用OWL2的RL/RDF规则的子集(利用了反函数属性、函数属性、基数约束等)推理出新的owl:same As的关系,然后再用基准线方法进行整合。
     最后,基于业界权威数据集的大量实验证明了本文算法研究的有效性;原型系统的开发及实验证明了所提架构的有效性;两方面的实验结果表示本文针对四大关键技术所做的深入研究和提出的实现方案具体创新性,并且是可行和有效的。
With the development of information technology and network technology, the global digital information dramatic increase in speed of1018bytes per year, massive learning resource, lacking semantic association, displaying heterogeneous and different normalization characteristic which restricted and the network learning resources management and knowledge sharing seriously. Most of knowledge information on the Internet are stored by learning resource document, its size and structure brought a lot of challenges in knowledge processing, integrating and management.
     In2006, Tim Berners-Lee proposed the concept of Linked Data. This concept gives us a new method to solve the problems mentioned above. Linked Data application technology on Knowledge Data releasing is the most important step of Web of Date; Ontology is a key technology for knowledge representation, knowledge reasoning, knowledge sharing and knowledge reusing on Semantic Web. Using the Resource Description Framework and description released by the World Wide Web Consortium (W3C) Resource Description Framework (RDF; RDF Scheme RDFS) and Web Ontology Language (OWL), the formal definition of the relationship between the concept and the concept of document, it global and machine readability enhanced semantic retrieval and human-computer synergy. The unstructured learning resources can be transformed into knowledge management. The most important role of the Linked Data is data integration and empowerment of semantics. At the same time, a lot of applications show that Linked Data in E-Learning System can provide a new semantic knowledge services. According analysis above, the paper argues that the data of Linked Course's construction, organization and knowledge management are the main researching tendency on e-Learning in the future.
     This paper concerns the two main parts:(1)data of Linked Courses organization;(2)data of Linked Courses knowledge management. In the first part of the data of Linked Courses organization stage, learning resources automatic or semi-automatic have to be conversed to RDF data, and then the RDF data of knowledge extracted and processed in order to create link data between different knowledge data set which are in the LOD to build into related courses data, and then according to their OWL ontology language description to achieve the data linking. In the second part of the knowledge management stage, this paper discussed semantic data storage, indexing and coreference between the different data sets.
     This paper research on data transforming, linked course data construction, knowledge ontology construction, indexing&storage of linked data, data integration.
     (1) This paper innovate four-step data transforming method to transform the spreadsheet into RDF data. First step, linked with the column of table head with the class of LOD dataset; second step, linked the value of the cell to the instances of these classes; third step, find out the semantic relationships between columns in the table; in the end, output linked semantic annotation.
     (2) This paper proposed a Linked Courses data construction method and realized computer hardware courses RDF datasets based on computer interface and computer principle. Large-related data sets, such as LOD DBpedia etc. had been linked to those data by owl:sameA linking. After the construction of Linked Courses data, we introduced the idea of ontology into it by using facilitate knowledge-related and navigation predicate and adding subsequent cognitive order relationship of knowledge points. It can provide a better services platform for the application of the knowledge.
     (3) This paper proposed a storage system that uses five indexes, namely, Subject, Predicate, Object, Value and Class, on top of any column oriented DB. The main techniques used by the proposed scheme are horizontal partitioning of the logical indices and special indices for values and classes. This approach has the advantage of delivering better performance if the underlying column store technology improves. The proposed approach is conceptually much simpler than the state-of-the-art native-storage based proposals and roughly gives the same performance. The proposal extends an existing approach, SW-Store, that uses column oriented DBs and vertical partitioning and obtains a two/three fold performance improvement.
     (4) With respect to consolidation, we investigate (1) a baseline approach, which uses explicit owl:sameAs relations to perform consolidation;(2) extended entity consolidation which additionally uses a subset of OWL2RL/RDF rules to derive novel owl:same As relations through the semantics of inverse-functional properties, functional-properties and (max-)cardinality restrictions with value one.
     Finally, a large number of experiments based on the industry's authoritative data set proved the effectiveness of the proposed algorithm research; prototype system developed and experiments result proved the effectiveness of the proposed architecture. All the results conduct the four key technologies application in this paper are innovative, feasible and effective on course data organization and knowledge management.
引文
[1]. http://linkeddata.org/
    [2]. Tim Bemers Lee.http://www.w3.org/Designlssues/LinkedData.htinl
    [3]. http://en.wikipedia.org/wiki/Linked_data
    [4]. http://data.gov.uk/
    [5]. Ed Summers, Antoine Isaac, Clay Redding, Dan Krech.LCSH,SKOS和关联数据[J].姚小乐,刘炜译.现代图书情报技术,2009,(3):8-14.
    [6]. http://www.infochimps.com/datasets/rdfizing-and-interlinking-the-eurostat-data-set-effort
    [7]. Kingsley Idehen. BBC linked data meshup in 3 steps [EB/OL]. http://www.openlinksw.com/dataspace/kidehen@openlinksw.com/weblog/kidehen@op enlinksw.coms%20BLOG%20%5B 127%5D/1560,2010-04-12.
    [8]. http://richard.cyganiak.de/2007/10/lod/
    [9]. http://dbpedia.org/About
    [10]. http://www.freebase.com/
    [11]. http://www.mpi-inf.mpg.de/yago-naga/yago/
    [12]. http://www4.wiwiss.fu-berlin.de/dblp/
    [13]. http://www. w3. org/wiki/WordNet
    [14]. http://www4.wiwiss.fu-berlin.de/drugbank/
    [15]. http://www.geonames.org/ontology/documentation.html
    [16]. Xing Niu, Xinruo Sun, Haofen Wang, Shu Rong, Guilin Qi and Yong Yu, Zhishi.me--Weaving Chinese Linking Open Data, in Proc. of 10th International Semantic Web Conference (ISWC 2011), October 23,2011, Bonn, Germany.v 7032 LNCS, n PART 2, p 205-220.
    [17]. http://zhishi.me/
    [18].刘炜.关联数据:概念,技术及应用展望[J].大学图书馆学报2011,02.
    [19].黄永文.关联数据驱动的Web应用研究[J].图书馆杂志2010,7.
    [20].娄秀明.用关联数据技术实现网络知识组织系统的研究[硕士论文].华东师范大学.2010.
    [21].马费成,赵红斌,万燕玲,杨东晨,赖洁.基于关联数据的网络信息资源集成[J].情报杂志2011.2(30):167-175.
    [22]. http://www.jisc.ac.uk/whatwedo/projects/semantictechnologies.aspx
    [23]. http://research.microsoft.com/en-us/projects/orechem
    [24]. C larke C, Greig F. Case S tudy:A L inked Open Data Resource List M anagement Tool for Undergraduate Students[ EB/OL]. [2010-01-20]. http://www. w 3.org/2001 / sw /sweo/public/U seCases/Talis
    [25]. Etzioni, O., Banko, M., Cafarella, M.:Machine reading. In:Proceedings of the National Conference on Artificial Intelligence. Volume 21., Menlo Park, CA; Cambridge, MA; London; AAAI Press; MIT Press; 1999 (2006).
    [26]. Barrasa, J., Corcho, O., G'omez-p'erez, A.:R2o, an extensible and semantically based database-to-ontology mapping language. In:Proc.2nd Workshop on Semantic Web and Databases (SWDB2004). Volume 3372., pp.1069-1070.
    [27]. Auer, S., Feigenbaum, L., Miranker, D., Fogarolli, A., Sequeda, J.:Use cases and requirements for mapping relational databases to RDF, W3C working draft. Technical report (2010).
    [28]. Langegger, A., Wob, W.:Xlwrap-querying and integrating arbitrary spreadsheets with sparql. In:8th International Semantic Web Conference (ISWC2009). (2009) Volume 5823/2009,359-374.
    [29]. Han, L., Finin, T., Parr, C., Sachs, J., Joshi, A.:RDF123:from Spreadsheets to RDF. In: Seventh International Semantic Web Conference, Springer (2008) Volume 5318/2008, 451-466.
    [30]. Han, L., Finin, T., Yesha, Y.:Finding Semantic Web Ontology Terms from Words.In: Proceedings of the Eigth International Semantic Web Conference, Springer (2009).
    [31]. J. Wang, B. Shao, H. Wang, and K. Q. Zhu. Understanding tables on the web. Technical report, Microsoft Research Asia,2011.
    [32]. W. Wu, H. Li, H. Wang, and K. Zhu. Towards a probabilistic taxonomy of many concepts. Technical report, Microsoft Research Asia,2011.
    [33]. P. Venetis, A. Halevy, J. Madhavan, M. Pasca,W. Shen, F. Wu, G. Miao, and C. Wu. Recovering semantics of tables on the web. In Proc.37th Int. Conf, on Very Large Databases,2011.
    [34]. G Limaye, S. Sarawagi, and S. Chakrabarti.Annotating and searching web tables using entities, types and relationships. In Proc.36th Int'l Conference on Very Large Databases,2010.
    [35]. Langegger, A., Wob, W.:Xlwrap-querying and integrating arbitrary spreadsheets with sparql. In:8th International Semantic Web Conference (ISWC2009). (2009).
    [36]. http://videolectures.net/iswc07_harth_frs
    [37]. http://www.alphaworks.ibm.com/tech/semanticstk
    [38]. http://github.com/kasei/hexas-tore
    [39]. M. Stocker, A. Seaborne, A. Bernstein, C. Kiefer, and D. Reynolds. SPARQL basic graph pattern optimization using selectivity estimation.2008, pp.595-604.
    [40]. S. Harris and N. Gibbins.3store:Efficient bulk RDF storage. In PSSS, FSWF, volume 89 of CEUR Workshop Proceedings,2003.
    [41]. K. Wilkinson, C. Sayers, H. Kuno, and D. Reynolds.Effcient RDF storage and retrieval in Jena2. In Proc.First International Workshop on Semantic Web andDatabases,2003.
    [42]. D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. SW-Store:a vertically partitioned DBMS for semantic web data management. The VLDB Journal, 18(2):385-406, Apr.2009.
    [43]. T. Neumann and G Weikum. RDF-3X:a RISC-style engine for RDF. Proc. VLDB Endow. 1(1):647-659, Aug.2008.
    [44]. M. Nagy, M. Vargas-Vera, E. Motta:DSSim - Managing Uncertainty on the Semantic Web. In Proceedings of the 2nd International Workshop on Ontology Matching (OM-2007),2007.
    [45]. J. Tang, J.-Z. Li, B. Liang, X. Huang, Y. Li, K. Wang:Using Bayesian decision for ontology mapping. In Journal of Web Semantics 4(4), pages 243-262,2006.
    [46]. M. Mao, Y. Peng:The PRIOR+:Results for OAEI Campaign 2007. In Proceedings of the 2nd International.2007.
    [47]. W. Hu, Y. Qu, G. Cheng:Matchinglargeontologies:A divide-and-conquer approach. Volume 67, Issue 1, October 2008, Pages 140-160.
    [48]. http://technologies.kmi.open.ac.uk/knofuss/
    [49]. http://www.okkam.org
    [50]. Andreas Thor, Erhard Rahm, Moma-a mapping-based object matchingsystem, in: CIDR (2007) 247-258.
    [51]. Fatiha Sais, Nathalie Pernelle, Marie-Christine Rousset, Combining a logicaland a numerical method for data reconciliation, J. Data Sem.12 (2009)66-94.
    [52]. Jennifer Sleeman, Tim Finin, Learning coreference relations for FOAF instances, in: Poster and Demo Session at ISWC (2010).
    [53]. Lian Shi, Diego Berrueta, Sergio Fernandez, Luis Polo, Silvino Fernandez, Smushing RDF instances:are Alice and Bob the same open source developer?, in:PICKME Workshop (2008).
    [54]. Anja Jentzsch, Jun Zhao, Oktie Hassanzadeh, Kei-Hoi Cheung, Matthias Samwald, Bo Andersson. Linking Open Drug Data, in International Conference on Semantic Systems (Ⅰ-SEMANTICS'09),2009.
    [55]. Yves Raimond, Christopher Sutton, Mark B. Sandler, Interlinking Music-Related Data on the Web, IEEE MultiMedia 16 (2) (2009) 52-63.
    [56]. Fergal Monaghan, David O'Sullivan, Leveraging ontologies, context and social networks to automate photo annotation, in:SAMT (2007) 252-255.
    [57]. Manuel Salvadores, Gianluca Correndo, Bene Rodriguez-Castro, Nicholas Gibbins, John Darlington, Nigel R. Shadbolt. LinksB2N:automatic data integration for the Semantic Web, in:OTM Conferences (2),2009, pp.1121-1138.
    [58]. Wei Hu, Jianfeng Chen, Yuzhong Qu, A self-training approach for resolving object coreference on the Semantic Web, WWW (2011) 87-96.
    [59]. Wei Hu, Yuzhong Qu, Xingzhi Sun, Bootstrapping object oreferencing on the Semantic Web, J. Comput. Sci. Technol.26 (4) (2011) 663-675.
    [60]. Aidan Hogan, Axel Polleres, Jurgen Umbrich, Antoine Zimmermann, Some entities are more equal than others:statistical methods to consolidate Linked Data, in:Fourth International Workshop on New Forms of Reasoning for the Semantic Web:Scalable and Dynamic (NeFoRS2010),2010.
    [61]. http://www.sindice.com/
    [62]. Eyal Oren, Renaud Delbru, Michele Catasta, Richard Cyganiak, Holger Stenzhorn, Giovanni Tummarello, Sindice.com:a document-oriented lookup index for open linked data, Int. J. Metadata Semant. Ontol.3 (1) (2008) 37-52.
    [63]. Giovanni Tummarello, Richard Cyganiak, Michele Catasta, Szymon Danielczyk, Stefan Decker, Sig.ma:live views on the Web of data, in:Semantic Web Challenge (ISWC2009) (2009). Volume 8, Issue 4, November 2010, Pages 355-364.
    [64]. Barry Bishop, Atanas Kiryakov, Damyan Ognyanov, Ivan Peikov, Zdravko Tashev, Ruslan Velkov, Factforge:a fast track to the web of data, Semantic Web 2 (2) (2011) 157-166.
    [65]. http://www.monetdb.org/Home
    [66]. http://simile.mit.edu/wiki/RDFizers
    [67]. http://www.mindswap.org/2003/PhotoStuff/
    [68]. www.idi.ntnu.no/-heggland/ontolog/
    [69]. http://simile.mit.edu/repository/RDFizers/ocw2rdf/
    [70]. Hatcher, E., Gospodnetic, O.:Lucene in Action (In Action series). Manning Publications (2004).
    [71]. Levenshtein, V.I.:Binary codes capable of correcting deletions, insertions, and reversals. Technical Report 8 (1966).
    [72]. Joachims, T.:Training linear svms in linear time. In:Proceedings of the 12th ACM SIGKDD international conference on Knowledge discovery and data mining. KDD'06, New York, NY, USA, ACM (2006) 217-226.
    [73]. http://www.w3.org/2005/ajar/tab
    [74]. http://uriqr.com/
    [75]. http://jena.apache.org/
    [76]. http://www.postgresql.org/
    [77]. http://www.w3.org/RDF/Validator/
    [78]. A hash function for hash table lookup, http://burtleburtle.net/bob/hash/doobs.html.
    [79]. Tokyo cabinet:a modern implementation of DBM.. http://fallabs.com/tokyocabinet
    [80]. Postgresql partitioning manual.http://www.postgresql.org/docs/8.4/interactive/ ddl-partitioning.html
    [81]. Christian Bizer, Richard Cyganiak, Tom Heath, How to Publish Linked Data on the Web, linkeddata.org Tutorial (2008). http://linkeddata.org/docs/how-to-publish
    [82]. Vladimir Kolovski, Zhe Wu, George Eadon, Optimizing enterprise-scale OWL 2 RL reasoning in a relational database system, in:International Semantic Web Conference (2010).
    [83]. Aidan Hogan, Andreas Harth, Jurgen Umbrich, Sheila Kinsella, Axel Polleres, Stefan Decker, Searching and browsing linked data with SWSE:the Semantic Web search engine, J. Web Sem.9 (4) (2011) 365-401.
    [84]. Manning, C.D., Raghavan, P., Schtze, H.:Introduction to Information Retrieval.1 edn. Cambridge University Press (2008).
    [85]. C. Bizer and A. Schultz. Benchmarking the performance of storage systems that expose sparql endpoints. In 4th International Workshop on Scalable Semantic Web Knowledge Base Systems (SSWS2008), October 2008.
    [86]. D. J. Abadi, A. Marcus, S. R. Madden, and K. Hollenbach. SW-Store:a vertically partitioned DBMS for semantic web data management. The 92VLDB Journal, 18(2):385-406, Apr.2009.
    [87]. Sameer Singh, Michael L. Wick, Andrew McCallum, Distantly labeling data forlarge scale cross-document coreference, CoRR (2010).
    [88]. LoraA, RiiehiroM, ChristianT. OntoAIMS:Ontological approach to Courseware authoring [C].Proceedings of International Conference on Computers in Education.2003:1011-1014.
    [89]. Darina Dieheva, Sergey Sosnovsky, Tatiana Gavrilova, Peter Brusilovsky. Ontological web portal for Educational ontologies.Procceding of 12th International Conference on Artificial Intelligence in Education,Amsterdam, the Netherlands,2005:19-84.
    [90]. Ming-Che Lee,Ding Yen Ye,Tzone Wang.Java Learning object ontology [C].Proccedings of the Fifth IEEE International Conference on Advanced Learning Technologies(ICALT'05),2005:538-542.
    [91]. Simic G, Devedzic V. Building an intelligent system using modern Internet technologies [J].Expert Systems with Applications,2003.(8):231-246.
    [92].刘光蓉,杜小勇,王淡,崔建伟E-Learning系统中课程知识本体的构建与实现[J].情报学报vol 28(4):499-508.
    [93].刘光蓉.“C程序设计”课程内容本体构建[J].电化教育研究2008(12):42-45.
    [94].郝兴伟.基于知识本体的E-learning系统研究[博士论文].山东大学.2007.
    [95]. M Krotzsch, D Vrandecic, M Volkel. Semantic media wiki. In ISWC'06,935-942. 2006.
    [96]. Z. Syed and T. Finin. Creating and Exploiting a Hybrid Knowledge Base for Linked Data. Revised Selected Papers Series:Communications in Computer and Information Science. Springer, April 2011.
    [97]. M. Atre, V. Chaoji, M. J. Zaki, and J. A. Hendler. Matrix "bit" loaded:a scalable lightweight join query processor for rdf data. In Proceedings of the 19th international conference on World wide web, pages 41-50. ACM,2010.
    [98]. Dezhao Song, Jeff Heflin, Automatically generating data linkages using a domain-independent candidate selection approach, in:International Semantic Web Conference (2011).
    [99]. Harry Halpin, Ivan Herman, Pat Hayes, When owl:sameAs isn't the Same:An Analysis of Identity Links on the Semantic Web, in:Linked Data on the Web WWW2010 Workshop (LDOW2010) (2010).
    [100]. F. Scharffe, Y. Liu, C. Zhou, RDF-AI:an architecture for RDF datasets matching, fusion and interlink, in:IJCAI 2009 Workshop on Identity, Reference, and Knowledge Representation (IR-KR) (2009).
    [101]. Niko Popitsch, Bernhard Haslhofer, DS Notify:handling broken links in the web of data, in:WWW (2010) 761-770.
    [102]. Aidan Hogan, Andreas Harth, Jurgen Umbrich, Sheila Kinsella, Axel Polleres,Stefan Decker, Searching and browsing linked data with SWSE:the Semantic Web search engine, J. Web Sem.9 (4) (2011) 365-401.
    [103]. Li Ding, Joshua Shinavier, Zhenning Shangguan, Deborah L. McGuinness,SameAs networks and beyond:analyzing deployment status and implications of owl:sameAs in linked data, in:Proceedings of the Ninth International Semantic Web Conference on the Semantic Web-Volume Part I, ISWC'10,Springer-Verlag, Berlin, Heidelberg, 2010, pp.145-160.
    [104]. Aidan Hogan, Exploiting RDFS and OWL for Integrating Heterogeneous, Large Scale, Linked Data Corpora, Ph.D. Thesis, Digital Enterprise Research Institute, National University of Ireland, Galway,2011.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700