面向社会网络应用的关系抽取研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
自搜索引擎出现至今,大量信息扑面而来,但其中绝大部分均为重复信息。搜索引擎返回了过多的结果却依旧很难找到有用的信息。倘若有一种方法能将检索结果进行有效过滤,只抽取出人们所需要的关键信息,并以网络图的形式,而非仅仅是文字的形式呈现出来的话,则人们获取信息的效率必将会大大提高。基于此,本文针对社会网络领域中命名实体间的关系抽取问题进行了深入研究,尝试构建了一个面向社会网络领域的社会关系本体,在包含两个或两个以上命名实体的句子中抽取出相应的词语作为实体间的关系描述。同时还定义了一系列的SWRL规则,并结合Jess推理引擎对本体中的隐含社会关系进行了挖掘。
     在命名实体识别任务中,本文主要针对人名和机构名进行识别,借鉴了语义角色标注的思想,采用Viterbi算法,自动标注出句中各分词片段在人名或机构名中所代表的不同角色,同时根据人名和机构名的成词特点,总结出符合条件的构词规则,进行模式匹配,以得出最终的识别结果。本文对真实语料进行了开放测试,实验结果显示,该方法的召回率高于准确率,已接近70%。此结果验证了上述方法的有效性。
     在关系抽取任务中,本文综合本体工程中的七步法和迭进法,构建了一个面向社会网络领域,应用于互联网行业内企业的社会关系本体。同时设计了一系列的SWRL规则,将其与社会关系本体一并导入Jess规则推理引擎中,尝试通过本体严密的概念逻辑关系进行推理,以挖掘出实体间的隐含社会关系。最终得到(实体关系实体)的关系三元组并存入关系库中,大大精炼了信息内容,提高了人们获取信息的效率。
We are surrounded by huge amount of information since the search engine appeared yet. But most of them are repetitive information. Search engine returned too many results to find useful information. The efficiency that people get the information they need will be greatly improved if there is a method could filter the retrieval results and just extract the key information. Based on this, Problems of relation extraction of named entities recognition in the field of social network have been mainly studied in this thesis. And an entity of social relations facing social network has been tried to established in the research. It means that relevant words were extracted from the sentences with two or more named entities to describe their relations. Meanwhile, with the help of reasoning engine Jess, a series of SWRL rules have been defined to reason and excavate the implicit relationships of entities.
     In named entity recognition task, personal name and organization name have been aimly identified in the study. The different roles in personal and institutional names represented by phrase segments in the sentences have been marked by using semantic role labeling and Viterbi algorithm. Then some proper word-formation rules were generated according to characteristics of word-formation to do pattern match, so as to get the final results. In the open test on realistic corpus, the result reflected that its recalling rate is better than precision, which is nearly 70%, and still have greatly improval space. The results show the effectiveness of the method.
     In the task of relation extraction, seven-step and iterative methods in ontological engineering will be integrated in this thesis to construct an social relationship ontology which faced social network and applied to the Internet business enterprisesis. The defined SWRL rules and the social relationship ontology have been imported into Jess rule reasoning engine to excavate the implicit relationship between entities, and eventually get the (entity relationship entity) relationship triad. The method greatly refined the content of information and improved the efficiency of information acquisition.
引文
[1]张素香,李蕾,谭咏梅.特定领域下关系模板的研究[J].北京邮电大学学报,2006(10),29(5):79-83
    [2]Longhua Qian, Guodong Zhou, Fang Kong, Qiaomin Zhu, Peide Qian. Tree Kernel-Based Semantic Relation Extraction Using Unified Dynamic Relation Tree [J]. In: Proceedings of the 7th International Conference on Advanced Language Processing and Web Information Technology,2008:64-69
    [3]邓擘,樊孝忠,杨立公.用语义模式提取实体关系的方法[J].计算机工程,2007,33(10):212-214
    [4]Preslav Nakov, Marti Hearst. Using Verbs to Characterize Noun-Noun Relations [A]. In: Proceedings of the 12th International Conference on Artificial Intelligence:Methodology, Systems, Applications (AIMSA),2006(9):233-244
    [5]Maria Ruiz-Casado, Enrique Alfonseca, Pablo Castells. From Wikipedia to Semantic Relationships:a Semi-automated Annotation Approach. SemWiki,2006
    [6]Zhou GD, Zhang M, Fu GH. Hierarchical learning strategy in relation extraction using support vector machines [A].In:Proceedings of Information Retrieval Technology,2006: 67-78
    [7]Liu Lu, Li Bi-cheng, Zhang Xian-fei. Named entity relation extraction based on SVM training by positive and negative cases [J]. Journal of Computer Applications,2008 (6), 28(6):1444-1446
    [8]Shui Yu, Yu Bai, Dongfeng Cai, Duo Ji. Research on automatically extraction of term relation extraction [A].In:Proceedings of the 2009 Chinese Conference on Pattern Recognition,2009:5
    [9]刘路,李弼程,张先飞.基于正反例训练的SVM命名实体关系抽取[J].计算机应用,2008(6),28(6):1444-1446
    [10]Suxiang Zhang, Suxian Zhang, Guoyang Gao. Automatic entity relation extraction based on conditional random fields [A]. In:Proceedings of the 2008 5th International Conference on Fuzzy Systems and Knowledge Discovery,2008(2):286-290
    [11]董静,孙乐,冯元勇,黄瑞红.中文实体关系抽取中的特征选择研究[J].中文信息学报,2007(7),21(4):80-85
    [12]车万翔,刘挺,李生.实体关系自动抽取[J].中文信息学报,2004,19(2):1-6
    [13]Zhang Suxiang, Wen Juan, Wang Xiaojie, Li Lei. Automatic Entity Relation Extraction
    Based on Maximum Entropy [A]. In:Proceedings of the 6th International Conference on Intelligent Systems Design and Applications,2006:540-544
    [14]Huang RL, Sun L, Feng YY. Study of Kernel-Based Methods for Chinese Relation Extraction [J].Information Retrieval Technology,2008(4993):598-604
    [15]Aron Culotta, Jeffrey Sorensen. Dependency Tree Kernels for Relation Extraction [A]. In: Proceedings of ACL,2004
    [16]Reichartz F, Korte H, Paass G. Dependency Tree Kernels for Relation Extraction from Natural Language Text [J]. Machine Learning and Knowledge Discovery in Databases, 2009(5782):270-285
    [17]Min Z, GuoDong Z, Aiti A. Exploring syntactic structured features over parse trees for relation extraction using kernel methods [J]. Information Processing and Management, 2008(5),44(2):687-701
    [18]Zhang J, Ouyang Y, Li WJ, Hou YX. A Novel Composite Kernel Approach to Chinese Entity Relation Extraction [A].In:Processing of Oriental Languages:Language Technology for The Knowledge-Based Economy,2009(5459):236-247
    [19]Zhou JS, Dai XY, Chen JJ, Qu WG. Chinese relation extraction using a new composite kernel [J].Recent Advance of Chinese Computing Technologies,2007:272-276
    [20]Suxiang Zhang. Entity relation extraction to free text[A]. In:Processing of 2009 International Conference on Natural Language Processing and Knowledge Engineering. 2009(9)
    [21]Uszkoreit H. Learning of positive and negative patterns for relation extraction [A].In: Proceedings of the 2009 Chinese Conference on Pattern Recognition.2009(11)
    [22]Tchalakova M, Popov B, Yankova M. Methodology for bootstrapping relation extraction for the Semantic Web [A]. In:Proceedings of Artificial Intelligence:Methodology, Systems, and Applications,2006(4183):222-232
    [23]Zhou GuoDong, Qian LongHua, Zhu QiaoMing. Label propagation via bootstrapped support vectors for semantic relation extraction between named entities [J]. Computer Speech and Language,2009(10),23(4):464-478
    [24]贾秀玲,文敦伟.一种本体学习中分类关系提取方法的研究[J].计算机技术与发展,2007(10),17(10):31-33
    [25]何琳.领域本体的关系抽取研究[J].现代图书情报技术,2008,4:35-38
    [26]J.Hassell, B.Aleman-Meza, I.B.Arpinar. Ontology-Driven Automatic Entity Disambiguation in Unstructured Text [A]. In:Proceedings of 5th Internal Semantic Web Conference (ISWC'06),2006:44-57
    [27]L.K.McDowell, M.Cafarella. Ontology-Driven Information Extraction with OntoSyphon [A].In:Proceedings of 5th Internal Semantic Web Conference (ISWC'06),2006:428-444
    [28]Karoui L, Aufaure M A, Bennacer N. Analyses and Fundamental Ideas for a Relation Extraction Approach [J], Data Engineering Workshop,2007:880-887
    [29]A. Schutz, P. Buitelaar. RelExt:a tool for relation extraction from text in ontology extension [A], In:Proceedings of International Semantic. Web Conference,2005: 593-606
    [30]Qingliang Zhao, Zhifang Sui. An integrated approach for concept learning and relation extraction [A].In:Proceedings of 22nd International Conference,2009:354-361
    [31]Abramowicz W, Vargas-Vera M, Wisniewski M, Axiom-based feedback cycle for relation extraction in ontology learning from text[A].In:Proceedings 19th International Conference on Database and Expert Systems Application,2008:202-206
    [32]Yaoyong Li, Kalina Bontcheva. Hierarchical, Perceptron-like Learning for Ontology Based Information Extraction [A].In:Proceedings of the 16th International World Wide Web Conference,2007:777-786.
    [33]张晓艳,王挺,陈火旺.命名实体识别研究[J].计算机科学,2005,32(4):44-48
    [34]温锐.中文命名实体识别及其关系抽取研究[D].苏州:苏州大学,2005
    [351罗智勇,宋柔.现代汉语自动分词中专名的一体化\快速识别方法[A].国际中文电脑学术会议,新加坡,2001:323-328
    [36]涂云杰,郑家恒.基于规则的汉语短语标注探讨[J].山西大学学报(自然科学版),2002,25(4):301-304
    [37]Welcome to the Edinburgh Language Technology Group [EB/OL] [2009.6.15] Available at:http://www.ltg.ed.ac.uk
    [38]K. Humphreys, Bernoulli Measure, S. Azzam, C. Huyck, B. Mitchell, H. Cunningham, Chongming Qiao. Description of the Lasie-Ⅱ System As Used For Muc-7 [A]. In: Proceedings of the 7th Message Understanding Conferences,1998
    [39]Jian Sun, Jianfeng Gao, Lei Zhang, Ming Zhou, Changning Huang. Chinese Named Entity Identification Using Class-based Language Model [A]. In:Proceedings of the 19th International Conference on Computational Linguistics,2002
    [40]Zhou GuoDong, SuJian. Named Entity Recognition Using an HMM-based Chunk Tagger [A]. In:Proceedings of the 40th Annual Meeting of the ACL, Philadelphia,2002:473-480
    [41]Ratnaparkhi A. A Simple Introduction to Maximum Entropy Models for Natural Language Processing [J]. Institute for Research in Cognitive Science, University of
    Pennsylvania,1997
    [42]Sekine S, Grishman R, Shinou H. A Decision Tree Method for Finding and Classifying Names in Japanese Texts [A]. In:Proceedings of the 6th Workshop on Very Large Corpora,1998
    [43]Brill E. Transform-based Error-Driven Learning and Natural Language Processing:A Case Study in Part-of-speech Tagging [J]. Computational Linguistics,.1995.21(4):543-565
    [44]Burr Settles. Biomedical Named Entity Recognition Using Conditional Random Fields and Rich Feature Sets [A]. In:Proceedings of the COLING 2004 International Joint Workshop on Natural Language Processing in Biomedicine and its Applications (NLPBA),2004
    [45]Feng YY, Sun L, Zhang JL. Early Results for Named Entity Recognition with Conditional Random Fields [A]. In:Proceedings of the 2005 IEEE International Conference on Natural Language Processing and Knowledge Engineering,2005:549-552
    [46]郑家恒,李鑫,谭红叶.基于语料库的中文姓名识别方法研究[J].中文信息学报,2000,14(1):7-12
    [47]李建华,王晓龙.中文人名自动识别的一种有效方法[J].高技术通讯,2000(2):46-49
    [48]张小衡,王玲玲.中文机构名称的识别与分析[J].中文信息学报,1997,11(4):21-32
    [49]李军,王丁,王鑫.基于模板匹配的中文机构名识别[J].信息技术,2008(6):96-99
    [50]郑家恒,张辉.基于HMM的中国组织机构名自动识别[J].计算机应用,2002(11),22(11):1-3
    [51]周波,蔡东风.基于条件随机场的中文组织机构名识别研究[J].沈阳航空工业学院学报,2009(2),26(1):49-52
    [52]周俊生,戴新宇,尹存燕.基于层叠条件随机场模型的中文机构名自动识别[J].电子学报,2006,5:804-809
    [53]尹继豪,樊孝忠,于德江.基于类语言模型的中文机构名称自动识别[J].计算机科学,2006,33(11):212-214
    [541张艳丽,黄德根,张丽静,杨元生.统计和规则相结合的中文机构名称识别[A].全国第六届计算语言学联合学术会议,2001:233-239
    [55]吴雪军,朱靖波,王会珍,叶娜,张宇新.Co_Training的机器学习方法在中文机构名识别中的应用[A].全国第七届计算语言学联合学术会议,2003(7):85-90
    [56]沈嘉懿,李芳,徐飞玉,]Hans Uszkoreit.中文组织机构名称与简称的识别[J].中文信息学报,2007(11),21(6):17-21
    [57]张晓艳,王挺,陈火旺.基于混合统计模型的汉语命名实体识别方法[J].计算机工程与
    科学,2006,28(6):135-139
    [58]不做什么,做什么,为什么—董振东与知网.[EB/OL][2009.7.3]Available at:http://wiki.52nlp.cn
    [59]张华平,刘群.基于角色标注的中国人名自动识别研究[J].计算机学报,2004(1),27(1):85-91
    [60]俞鸿魁,张华平,刘群.基于角色标注的中文机构名识别[A]. In:Proceedings of the 20th International Conference on Computer Processing of Oriental Languages,2003:79-87
    [61]邓志鸿,唐世渭等.Ontology研究综述[J].北京大学学报(自然科学版),2002(9)38(5):730-737
    [62]Cf. T. R. Gruber. A Translation Approach to Portable Ontologies. Knowledge Acquisition,1993,5(2):199-220
    [63]Borst W. N. Construction of Engineering Ontologies for Knowledge Sharing and Reuse[D], University of Twente, Enschede,1997
    [64]唐颖峰.基于关系数据库的本体自动构建的研究[D].湖南:中南大学,2009(5)
    [65]Uschold-M, King-M. Towards a methodology for building ontology[C]. In workshop on basic ontological issues in knowledge sharing:International Joint Conference on Artificial Intelligence.2006(1):373-380
    [66]刘仁宁,李禹生.领域本体构建方法[J].武汉工业学院学报,2008(3),27(1):46-49
    [67]袁小娟.动画素材的领域本体模型与语义推理研究[D].湖南:湖南师范大学,2009(3)
    [68]伍宏伟.基于语义WEB技术的产品配置研究[D].上海:上海交通大学,2009(3)

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700