面向学科的知识元标引关键技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着知识经济时代的来临,知识受到史无前例的关注和重视。知识的科学管理和有效利用可以实现知识增值,成为数字化时代知识管理的重要目标。信息技术环境下,如何为用户提供准确的、个性化的知识,已成为当前的热点研究之一。为了向用户提供更好的知识服务,需要从知识元层对知识组织、管理等进行研究。论文在知识组织与管理相关理论基础上,建立知识元描述模型,提出基于知识元的向导信息提取方法和知识元标引方法,开展了系统的研发与典型应用。论文的主要内容如下:
     第一,建立了面向学科的知识元描述模型。对知识元概念的界定,以文本知识元作为研究对象,采用面向对象设计方法,提出了一个面向学科的知识元描述模型,该模型定义了知识元所包含的9个基本组成元素,建立了模型的XML文档,并将该模型应用于学科文本知识元的描述。
     第二,提出了基于相邻词共现分析法的向导信息提取方法。通过人工统计与分析已有主题词特征来发现向导信息的构词规律,结合词频、词性、上下文特征及位置等关联信息,设计相邻词共现分析的向导信息提取方法,对该方法的相邻共现信息进行定义,并将其应用于向导信息的提取中。该方法无需专业词典的支持,其提取效果要优于传统的TF/IDF方法。
     第三,提出了基于规则的知识元标引方法。论文选取了知识元描述模型中具有代表性的四个元素:知识元名称、知识元描述、知识元属性和知识元来源作为具体的研究对象。在人工标引分析的基础上,发现知识元描述的句法特征,建立了知识元的提取规则,提出了一种基于规则的知识元标引方法。实验证明,该方法可以为用户提取出文中主要的知识元描述,极大地提高知识元标引效率。此外,还研究了知识元之间的关系。
     第四,完成知识元标引系统的设计实现与应用研究。论文从系统整体设计、流程设计、数据库设计、系统的实现、测试与评价几个方面,实现了知识元标引系统的主要功能,即文本预处理、主题词的提取、知识元描述的提取、知识元表示和知识元查找功能。该系统将上述三个方而的研究成果组织成一个有机整体,从系统层而验证了本研究的可行性和有效性。最后,对该系统在学科教育文本资源的聚类检索中的应用研究进行了说明。
     论文的主要特色和创新之处体现在:
     (1)构建了一种而向学科的知识元描述模型,该模型有别于传统的“资源、索引和元数据目录”信息组织模型,定义了“知识元”基本结构,为构建“主题、知识元和主题图”知识组织模型奠定基础。
     (2)提出了基于相邻词共现分析法的主题词提取方法,该方法基于主题词的语言特征规律,结合主题上下文关系,实现了自动挖掘领域知识元向导信息特征词。
     (3)提出了一种基于规则的知识元标引方法,该方法区别于大粒度资源的信息组织标引方法,论文结合语义内容信息和主题词关联关系等,通过规则发现知识元,实现了细粒度、语义特征的知识元标引。
     其研究成果为基于知识元的知识挖掘、知识融合和知识浓缩关键技术研究奠定了基础。
In the knowledge economy era, knowledge has attracted the attention of many scholars. The scientific management and effective application of knowledge can achieve value-added knowledge, which comes to be one of the most important targets about knowledge management in digital era. In order to provide better knowledge services, it is required to research knowledge organization and knowledge management from knowledge element level. Based on the knowledge organization and management relevant theories, this dissertation builds up a data model for the knowledge element, gives a solution for the extraction of guide information about knowledge element and a method of knowledge element indexing, and provides a realization for this solution and implementation, such as design and programming project for this solution. The main contents of this dissertation include:
     Firstly, a subject-oriented knowledge element data model is constructed.
     After the explanation of the knowledge element definition, we choose text knowledge element as the research object, and build up a subject-oriented knowledge element data model according to the object-oriented design methods, the model includes nine foundational elements of knowledge element. A XML document is created to represent the data model information to describe the text knowledge element of the subject.
     Secondly, the method of guide information extraction from document based on Neighbour Words Co-occurrence is provided.
     The guide information means subject word. From the analysis for the partly chapter documents of journal, we obtain the properties of the subject words which are indexed manually. Based on the related linguistic properties of texts such as frequency, property, context characteristics and location, we provide the method of subject word extraction from document based on Neighbour Words Co-occurrence and give some basic definitions. The experiment results show that our method need not subject dictionary, and the result is better than the traditional TF/IDF.
     Thirdly, the method of knowledge element indexing based on regulation is provided.
     Choosing four typical elements in knowledge element data model including knowledge element name, description, property and source, we analyze the general construction characteristics by manual and find the sentence regulation. According to the manual analysis result, we build the extraction regulation of knowledge element, and provide the method of knowledge element indexing based on regulation. The experiment result shows that our method can extract the main knowledge element description and improve greatly the efficiency of the knowledge element indexing. Furthermore, we take a research about the knowledge element relationship.
     Finally, a subject-oriented knowledge element indexing system is designed and implemented. The dissertation has detailed the whole system design, flow chart, database, realization, testing and its evaluation, making it work properly with the main functions of indexing system of knowledge elements. This function contains the text preprocessing, extraction of subject words, extraction of description sentences of knowledge element, presentation and the retrieval of the knowledge element. This prototype system integrates the above three aspects and inversely proves the possibility and efficiency of our research work. We describe the application research of knowledge element-based clustering retrieval system for educational text resources.
     The main results of this dissertation include:
     (1) A subject-oriented knowledge element data model is constructed. This data model is different from traditional information organization model which is based on resource, index and meta-data catalog. This data model gives the knowledge element structure, which is the foundation to build a "theme, knowledge element and topic map" knowledge organization model.
     (2) A method of subject word extraction based on Neighbour Words Co-occurrence is provided. The method is provided according to the linguistic properties of texts, the subject context and the characteristics of subject word mining, which realizes the subject word automatic extraction.
     (3) A method of knowledge element indexing based on extract regulation is provided. Different from the information and resource indexing method in large granularity, this method combines the semantic content information and subject word relationship, and discovers the knowledge element from document by rules, and realizes the knowledge element indexing in small granularity and semantic.
     The research results of this dissertation concentrate the foundation for the research of knowledge element-based knowledge mining, knowledge integration and knowledge enrichment key technologies research.
引文
[1]彼得·德鲁克.后资本主义社会[M].上海:上海译文出版社,1998.
    [2]国家863探索导向类项目“知识浓缩与知识融合关键技术研究”项目中报书,华中师范大学,2008.
    [3]Andreas S.Rath, Mark Kroll, Stefanie Lindstaedt et.al. Contex-Aware Knowledge Serviecs[C]. CHI2008,2008.
    [4]Anette Hulth, Beata B. Megyesi. A Study on Automatically Extracted Keywords in Text Categorization[C]. Proceedings of the 21st International Conference on Computational Linguistics and 44th Annual Meeting of the ACL,2006:537-544.
    [5]Ronald Maier. Knowledge Management Systems:Information and Communication Technologies for Knowledge Management (Third Edition) [M]. Springer.2007:265-278.
    [6]温有奎,徐国华.知识元挖掘[M].西安:西安电子科技大学出版社,2005.4.
    [7]温有奎,焦玉英.知识元语义链接模型研究[J].图书情报工作,2010,54(12):27-31.
    [8]温有奎,焦玉英.基于范畴论的知识单元组织与检索研究[J].情报学报,2010,(3).
    [9]WEN Youkui, JIAO Yuying. Knowledge Fusion Creation model and Its Implementation Based on Wiki Platform[C]. International Symposium on Information Engineering and Electronic Commerce,2009:495-499.
    [10]赵火军,温有奎.基于引文链的知识元挖掘研究[J].情报杂志,2009,28(3):148-150.
    [11]温有奎,基于“知识元”的知识组织与检索[J].计算机工程与应用,2005,(1):55-57.
    [12]姜永常,杨宏岩,张丽波.基于知识元的知识组织及其系统服务功能研究[J].情报理论与实践,2007,30(1):37-40.
    [13]李锐,王泰森.基于知识元的知识组织与知识服务[J].图书馆学研究,2008,(8):84-86.
    [14]文庭孝,侯经川,龚蛟腾等.中文文本知识元的构建及其现实意义[J].中国图书馆学报,2007,(6):91-95.
    [15]谷建军.基于叙词表的中医古籍文献领域本体建模方法研究[D].中国中医科学院博士学位论文,2006.
    [16]柳长华.基于知识元的中医古籍计算机知识表示方法[A].第三届国际传统医学工大会文集,2004:133-139.
    [17]肖怀志.基于本体的历史年代知识元应用研究[D].武汉:武汉大学博士论文,2005.
    [18]邹军华,黄涛,刘清堂.基于本体构建面向课程管理系统的数学知识元[J].中国电化教育,2010,(287):116-120.
    [19]Huimin Lu, Boqin Feng, Yingliang Zhao et.al, A New Model for Distributed Knowledge Organization Management[C].2008 Seventh International Conference on Grid and Cooperative Computing,2008.
    [20]Wei Wang, Qinghua Zheng, Jun Liu et.al. Exploiting various information for knowledge element relation recognition[C]. Granular Computing,2009:565-571.
    [21]陈守强,李东.知识元挖掘技术在军事信息处理中的应用[J].情报杂志,2006,(12):75-77.
    [22]Xiao Chang, Qinghua Zheng, Knowledge Element Extraction for Knowledge-Based Learning Resources Organization[C], ICWL2007, LNCS 4823,2008:102-113.
    [23]Sun Xia, Zheng Qinghua, METADATA EXPANDED SEMANTICALLY BASED RESOURCE SEARCH IN EDUCATION GRID[J].西安交通大学学报(英文版),2005,17(2):127-130.
    [24]穗志方.国家重点基础研究计划973课题“文本内容理解的数据基础”之子任务:知识元数据库及其基础平台建设报告,2009.
    [25]Michael K. Masten, Eugene O. King. Knowledge Element Identified for Metal Rolling Control Engineers[J]. Control Systems Magazine,2002(8):108-110
    [26]Nicolae PELIN, Serghei PELIN. MODELS AND SYSTEMS FOR STRUCTURIZATION OF KNOWLEDGE IN TRAINING[J]. Journal of Applied Quantitative Methods. Vol.2 No.4 Winter 2007:418-430
    [27]赵蓉英.论知识网络的结构[J].图书情报工作,2007,51(9):6-10.
    [28]傅骞,魏顺平,王斌等.教育技术领域术语提取研究[J];现代教育技术,2008,18(5):60-65.
    [29]魏顺平.基于文献文本的概念图构建方法——以协作学习领域概念图构建为例[J].中国远程教育.2008,(2):47-52.
    [30]Ellen Monk.et al. Concepts in Enterprise Resource Planning, Second Edition. Thomson Course Technology, Boston,2006.
    [31]Ling Jiang, Zongkai Yang, Qingtang Liu et al. Summarization on the Data Mining Application Research in Chinese Education[C]. Proceedings of Workshop on Blended Learning WBL-2008. LNCS,2008:110-120
    [32]Data, Information, Knowledge, and Wisdom[OL]. http://www.systemsthinking.org/dikw/dikw.htm.
    [33]D. Hand, et al. Principles of Data Mining[C]. MIT Press, Cambridge,2001.
    [34]史忠植.知识发现[M].北京:清华大学出版社.2002:1-2.
    [35]Lorin W. Anderson, David R. Krathwohl, Benjamin Samuel Bloom, A Taxonomy for Learning, Teaching, and Assessing:A Revision of Bloom's Taxonomy of Educational Objectives[M], New York:Addison Wesley Longman,2001.
    [36]郑蕙如林世华,Bloom認知领域教育目標分類修訂版理論会與實務之探討—以九年一贯裸程数學领域分段能力指標為例,台束大學教育學報,民93,15(2):247~274.
    [37]Knowledge organization[OL]. http://en.wikipedia.org/wiki/Knowledge organization.
    [38]G.G. Chowdhury. Knowledge Organization or Information Organization? A Key Component of Knowledge Management Activities.1-12
    [39]卢文辉.国内知识组织研究进展[J].情报探索,2009,(3):34-36.
    [40]蒋永福,李景正.论知识组织方法[J].中国图书馆学报,2001,(1):3-7.
    [41]陆敏,杨发毅,彭骏.基于本体的知识组织和知识检索[J].现代情报,2009,29(1):144-150.
    [42]王知津.从情报组织到知识组织[J].情报学报,1998,17(3):230-234.
    [43]Nissen M E. Redesigning Reengineering Through Measurement-Driven Inference. M IS Quarterly,1998,22 (4):509-534.
    [44]EpplerM J. Making Knowledge Visible Through Intranet knowledge map:concept, elements, cases. Proceedings of the 34th Hawaii International Conference on System Sciences. USA, 2001.
    [45]马费成隧盟.袁红.科学信息离散分布规律的研究:从文献单元到内容单元的实证分析(Ⅰ、Ⅱ、Ⅲ、Ⅳ、Ⅴ、Ⅵ):情报学报.1999.2-5.
    [46]Steve Pepper. Topic Maps[OL].http://www.ontopedia.net/pepper/papers/ELIS-TopicMaps.pdf
    [47]Steve Pepper, Senior Information Architect. The TAO of Topic Maps[OL]. http://www.ontopia.net/topicmaps/materials/tao.pdf.
    [48]Lutz Maicher, Lars Marius Garshol (Eds.). Subject-centric Computing.Fourth International Conference on Topic Maps Research and Applications[C], TMRA 2008.
    [49]Ling Jiang, Chengling Zhao, Haimei Wei. The Development of Ontology-Based Course for Computer Networks[C].Cite2008, Wuhan,2008.12.
    [50]Topic Maps-Data Model[OL]. http://www.isotopicmaps.org/sam/sam-model/.
    [51]Subject indexing[OL]. http://en.wikipedia.org/wiki/Subject_indexing.
    [52]冷伏海.信息组织概论[M].北京:科学出版社,2004.3:236-241
    [53]章成志.自动标引研究的回顾与展望[J].后现代图书情报技术,2007,(11):33-39.
    [54]KEA (Keywords and keyphrases) [OL]. http://www.nzdl.org/Kea/.
    [55]马张华.信息组织(第二版)[M].北京:清华大学出版社,2003.3:213-P216.
    [56]Ontolingua[OL]. http://www.ksl.stanford.edu/software/ontolingua/.
    [57]GruberTR.A Translation Approaeh to Portable ontology Speeifieations.Knowledge Aequisition, 1993,5:199~220.
    [58]邓志鸿等Ontology研究综述.北京大学学报(自然科学版),2002,38(5):730-738.
    [59]Joseph D. Novak, The Theory Underlying Concept Maps and How to Construct and Use Them[OL]. http://cmap.coginst.uwf.edu/info/.
    [60]About WordNet[OL]. http://wordnet.princeton.edu/.
    [61]Fishing for information using WordNet[OL]. http://www.cs.ucl.ac.uk/staff/a.hunter/tradepress/wordnet.html.
    [62]Dong Zhen Dong. KNOWLEDGE DESCRIPTION:WHAT, HOW AND WHO? [OL]. http://www.keenage.com/html/c_index.html.
    [63]徐如镜.开发知识资源发展知识产业服务知识经济[OL].http://www.cnki.net/gycnki/daobao/cnkidaobao5/cnkidt06-2.htm.
    [64]化柏林,张新民.从知识抽取相关概念辨析看知识抽取的特点和发展趋势[J].情报科学.现代情报,2010,28(2):311-315.
    [65]温有奎,徐国华,赖伯年等.信息整流与知识增值服务[J].情报学报,2003,(3):273-277.
    [66]于渝丽.建设开放的知识元数据库是社会发展的需求[J].术语标准化与信息技术,2004,(3):28-30.
    [67]孙成江,吴正荆.知识服务战略:创建增值联盟[J].情报科学,2002,20(10):1028-1029.
    [68]黄晓斌,夏明春.数字图书馆知识网络的结构与模式[J].国家图书馆学刊,2010(2):38-42.
    [69]葛连兵.学术文献知识元数据库管理平台的设计与应用[D].北京:清华大学硕士论文,2006.
    [70]周宁,余肖生,刘玮等.基于XML平台的知识元表示与抽取研究[J].中国图书馆学报,2006(3):41-45.
    [71]戴铁成.面向知识管理的知识元数据库[OL]http://www.cnki.net/gycnki/daobao/cnkidaobao 15/gycnki015_16.htm.
    [72]王永成,顾晓明,王丽霞.中文文献主题的自动标引[J].情报学报,1998,17(3):219-220.
    [73]MSc. Dipl.-Inf, Elena Demidova. Automatic Keyword Extraction for Database Search[D]. Hannover, Februar 2009.
    [74]Anette Hulth, Jussi Karlgren, Anna Jonsson et al. Automatic Keyword Extraction Using Domain Knowledge[C]. LNCS2004,2004:472-482.
    [75]David B. Bracewell1, Jiajun Yanl, Fuji Ren. SINGLE DOCUMENT KEYWORD EXTRACTION FOR INTERNET NEWS ARTICLES [J]. International Journal of Innovative Computing, Information and Control.2008,4(4):909-913.
    [76]Michael J, Giarlo Rutgers. A Comparative Analysis of Keyword Extraction Techniques[D]. The State University of New Jersey,2009
    [77]徐文海,温有奎.一种基于TFIDF方法的中文关键词抽取算法[J].情报理论与实践,2008,(2).
    [78]张雪英.中文文本主题词自动抽取方法研究[J].情报学报,2008,(4).
    [79]王菁华.文本中知识的获取[D].北京:北京邮电大学博士论文,2008.
    [80]张颖颖,谢强,丁秋林.基于同义词链的中文关键词提取算法[J].计算机工程,2010,(19):93-95.
    [81]刘兴林,彭宏,马千里.基于增量词集频率的文本主题词提取算法研究[J]计算机应用研究,2010,(9).
    [82]孙越恒.基于统计的NLP技术在中文信息检索中的应用研究[D].天津大学博士论文.2005.
    [83]韩艳.基于统计的中文文本关键短语自动抽取方法研究[D].苏州大学硕士论文,2009.
    [84]Xiaojun Wan and Jianguo Xiao, Single Document Keyphrase Extraction Using Neighborhood Knowledge. Proceedings of the Twenty-Third AAAI Conference on Artificial Intelligence[C], 2008:855-860.
    [85]Lee-Feng Chien. PAT-tree-based keyword extraction for Chinese information retrieval[C]. SIGIR'97,1997.
    [86]孙继鹏;贾民.刘增宝一种面向学科的概念抽取方法的研究[J].计算机应用与软件2009,(09).
    [87]朱青;吕晓旭.基于机器学习的HTML标题抽取[J].微计算机信息,2010,(09).
    [88]孙霞;王小凤,董乐红等.术语关系自动抽取方法研究[J].计算机科学,2010,(02).
    [89]丁国栋.基于统计语言建模的信息检索及相关研究[D].北京:中国科学院研究生院,‘2006.
    [90]Christopher D. Manning, Hinrich Schutze. Foundations of Statistical Natural Language[M]. MIT Press.1999.
    [91]韩客松.中文文本主题自动提取和标引若干关键技术研究[D].上海交通大学博士学位论文,2000:56-61.
    [92]周秀会.知识元搜索引擎:CNKI知识搜索平台[J].现代情报,2007,(5):220-222.
    [93]Hua-Ping ZHANG Hong-Kui Yu De-Yi Xiong et al. HHMM-based Chinese Lexical Analyzer ICTCLAS,2nd SIGHAN workshop affiliated with 41th ACL, July,2003.PP184-187.
    [94]王口芬,宋爽,卢宁等.共现分析在文本知识挖掘中的应用研究[J].中国图书馆学报,2007,(2):59-64.
    [95]Yutaka Matsuo.Mitsuru Ishizuka, Keyword Extraction from a Single Document using Word Co-occurrence Statistical Information[J], International Journal on Articial Intelligence Tools, Vol.13, No.1 (2004) 157-169.
    [96]Y. MATSUO, M. Ishizuka, Keyword Extraction from a Single Document using Word Co-occurrence Statistical Information, International Journal on Artificial Intelligence Tools, 2003.
    [97]Thomas Hofmann, Jan Puzicha, Statistical Models for Co-occurrence Data,1998.
    [98]Hofmann, Thomas; Puzicha. Statistical Models for Co-occurrence Data[OL]: http://www.cs.brown.edu/~th/papers/HofmannPuzicha-AIM98.pdf.
    [99]宋爽.共现分析在文本知识挖掘中的应用研究[D].南京:南京理工大学硕士论文,2006.
    [100]李有华,李兴柱.中小学基于电子学档的发展性评价的实施策略[J].电化教育研究,2008,(2):53-58
    [101]Hui Jiao. Qian Liu, Hui-bo Jia, Chinese Keyword Extraction Based on N-gram and Word Co-occurrence, International Conference on Computational Intelligence and Security Workshops,2007:152-155.
    [102]Ying QIN, Qiufang WEN, Jinquan WANG, Automatic Evaluation of Translation Quality Using Expanded N-gram Co-occurrence 2009.
    [103]谈春梅,颜世伟,刘子牧.网络专题知识组织知识元自动抽取系统的设计与实现[J].现代图书情报技术,2008,(3):62-67.
    [104]王泰森,刘新.学习型知识元数据库的系统构成方案[J].图书馆学研究,2009,(9):20-24.
    [105]于杨.基于知识元的领域知识服务体系的研究与实现[D].大庆:大庆石油学院,2009.
    [106]肖洪,薛德军.基于大规模真实文本的数值知识元挖掘研究[J].计算机工程与应用,2008,44(30):150-152.
    [107]包冬梅.从信息组织视角解析CNKI[J].图书情报工作,2009,53(10):106-110.
    [108]Byron Marshall, Hsinchun Chen, Therani Madhusudan. Matching Knowledge Elements in Concept Maps using a Similarity Flooding Algorithm[C]. Decision Support Systems,2005.
    [109]成鹏.基于语义Web的知识元集成模型研究[D].西安:西安电子科技大学,2007.
    [110]温有奎,孙明,温浩,焦玉英.基于WEB的情报知识元挖掘与语义集成地图[J].情报学报,2008,(2).
    [111]高蝴蝶,张志林.基于知识元的内容组织对数字出版的启示[J].北京印刷学院学报,2009,17(5):33-36.
    [112]Ling Jiang, Zongkai Yang, Jixin Wang. Knowledge Indexing of Chinese Text Based Knowledge Element[C]. KAM2008,2008:35-38
    [113]常春,赖院根.基于文献标题词汇共现获取词间关系研究[J].图书情报工作,2009,53(8):17-20
    [114]Ling Jiang, Zongkai Yang, Qingtang Liu etal. The Use of Concept Maps in Educational Ontology Development for Computer Networks[C]. Grc2008,2008:346-349
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.