面向中文开放领域的多元实体关系抽取研究

英文篇名：Chinese open domain oriented n-ary entity relation extraction
作者：姚贤明 ; 甘健侯 ; 徐坚
英文作者：YAO Xianming;GAN Jianhou;XU Jian;School of Information Engineering, Qujing Normal University;Key Laboratory of Educational Informatization for Nationalities (YNNU), Ministry of Education;
关键词：中文、开放域 ; 多元实体关系 ; 依存句法分析 ; 句法结构 ; 关系抽取 ; 语义关系 ; 主谓宾
英文关键词：Chinese open domain;;n-ary entity relation;;dependency syntax analysis;;semantic structure;;relation extraction;;semantic relation;;subject predicate object
中文刊名：ZNXT
英文刊名：CAAI Transactions on Intelligent Systems
机构：曲靖师范学院信息工程学院;云南师范大学民族教育信息化教育部重点实验室;
出版日期：2018-12-25 13:20
出版单位：智能系统学报
年：2019
期：v.14;No.77
基金：国家自然科学基金项目(61562093);; 云南省应用基础研究计划重点项目(2016FA024)
语种：中文;
页：ZNXT201903027
页数：8
CN：03
ISSN：23-1538/TP
分类号：209-216

摘要

针对当前中文开放领域多元实体关系抽取研究较少的情况,借鉴国外已有的研究成果,结合中文自身的特点,提出了中文领域多元实体关系抽取的方法。该方法以句法分析结果的根节点作为入口,迭代地获取所有谓语的主语、宾语及其定语成分,再利用句法分析结果对这些成分进行完善,最终获取句子中的多个实体之间的语义关系。该方法被应用在不同的领域并进行了对比分析,实验结果表明:其具有一定的参考价值。另外,对实验数据进行了详细的分析,归纳了错误的主要情形,为今后的研究工作指明了方向。
In view of the scant research conducted regarding n-ary entity relation extraction in the Chinese open domain,in this paper, we propose a method for performing n-ary entity relation extraction in the Chinese domain based on existing research conducted abroad and Chinese characteristics. Starting with the root node of syntactic analysis, we obtain the subject, object, and attributive components of all the predicates. Then, we use the syntactic analysis result to perfect these elements and, finally, obtain the semantic relations of the n-ary entity. For comparative analysis, we applied the proposed method to different domains. The experimental results demonstrate its reference value. In addition, we analyzed the experimental data in detail and have summarized the main errors, which indicate the direction for future research.

引文

[1]CHINCHOR N,MARSH E.MUC-7 information extraction task definition(version 5.1)[C]//Proceedings of the Seventh Message Understanding Conference.Fairfax,Virginia,USA,1998.
    [2]甘丽新,万常选,刘德喜,等.基于句法语义特征的中文实体关系抽取[J].计算机研究与发展,2016,53(2):284-302.GAN Lixin,WAN Changxuan,LIU Dexi,et al.Chinese named entity relation extraction based on syntactic and semantic features[J].Journal of computer research and development,2016,53(2):284-302.
    [3]JAYRAM T S,KRISHNAMURTHY R,RAGHAVAN S,et al.AVATAR information extraction system[J].IEEEdata engineering bulletin,2006,29:1-9.
    [4]SHEN W,DOAN A H,NAUGHTON J F,et al.Declarative information extraction using datalog with embedded extraction predicates[C]//Proceedings of the 33rd International Conference on Very Large Data Bases.Vienna,Austria,2007:1033-1044.
    [5]ZHAO Shubin,GRISHMAN R.Extracting relations with integrated information using kernel methods[C]//Proceedings of the 43rd Annual Meeting on Association for Computational Linguistics.Ann Arbor,Michigan,2005:419-426.
    [6]TRATZ S,HOVY E.ISI:automatic classification of relations between nominals using a maximum entropy classifier[C]//Proceedings of the 5th International Workshop on Semantic Evaluation.Los Angeles,California,2010:222-225.
    [7]HASEGAWA T,SEKINE S,GRISHMAN R.Discovering relations among named entities from large corpora[C]//Proceedings of the 42nd Annual Meeting on Association for Computational Linguistics.Barcelona,Spain,2004:Article No.415.
    [8]SHINYAMA Y,SEKINE S.Preemptive Information extraction using unrestricted relation discovery[C]//Proceedings of the Main Conference on Human Language Technology Conference of the North American Chapter of the Association of Computational Linguistics.New York,USA,2006:304-311.
    [9]BANKO M,CAFARELLA M J,SODERLAND S,et al.Open information extraction from the web[C]//Proceedings of the 20th International Joint Conference on Artifical Intelligence.Hyderabad,India,2007:2670-2676.
    [10]SEKINE S,SUDO K,NOBATA C.Extended named entity hierarchy[C]//Proceedings of the 3rd International Conference on Language Resources and Evaluation.New York,USA,2002:1818-1824.
    [11]LING Xiao,WELD D S.Fine-grained entity recognition[C]//Proceedings of the 26th Conference on Advancement of Artificial Intelligence.Toronto,Canada,2012:94-100.
    [12]杨博,蔡东风,杨华.开放式信息抽取研究进展[J].中文信息学报,2014,28(4):1-11.YANG Bo,CAI Dongfeng,YANG Hua.Progress in open information extraction[J].Journal of Chinese information processing,2014,28(4):1-11.
    [13]徐增林,盛泳潘,贺丽荣,等.知识图谱技术综述[J].电子科技大学学报,2016,45(4):589-606.XU Zenglin,SHENG Yongpan,HE Lirong,et al.Review on knowledge graph techniques[J].Journal of University of Electronic Science and Technology of China,2016,45(4):589-606.
    [14]YATES A,CAFARELLA M,BANKO M,et al.TextRunner:open information extraction on the web[C]//Proceedings ofHuman Language Technologies:the Annual Conference of the North American Chapter of the Association for Computational Linguistics:Demonstrations.Rochester,New York,USA,2007:25-26.
    [15]ETZIONI O,CAFARELLA M,DOWNEY D,et al.Unsupervised named-entity extraction from the Web:an experimental study[J].Artificial intelligence,2005,165(1):91-134.
    [16]WU Fei,WELD D S.Open information extraction using Wikipedia[C]//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics.Uppsala,Sweden,USA,2010:118-127.
    [17]FADER A,SODERLAND S,ETZIONI O.Identifying relations for open information extraction[C]//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Edinburgh,United Kingdom,2011:1535-1545.
    [18]ETZIONI O,FADER A,CHRISTENSEN J,et al.Open information extraction:the second generation[C]//Proceedings of the Twenty-Second International Joint Conference on Artificial Intelligence.Barcelona,Catalonia,Spain,2011:3-10.
    [19]MINTZ M,BILLS S,SNOW R,et al.Distant supervision for relation extraction without labeled data[C]//Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the Afnlp.Suntec,Singapore,2009:1003-1011.
    [20]李杨.中文开放式实体关系抽取研究与实现[D].成都:电子科技大学,2016.LI Yang.Research and implementation of Chinese open entity relation extraction[D].Chengdu:University of Electronic Science and Technology of China,2016.
    [21]BANKO M,CAFARELLA M J,SODERLAND S,et al.Open information extraction from the web[C]//International Joint Conference on Artifical Intelligence,New York,USA,2007:2670-2676.
    [22]KAMBHATLA N.Combining lexical,syntactic,and semantic features with maximum entropy models for extracting relations[C]//Proceedings of the ACL 2004 on Interactive Poster and Demonstration Sessions.Barcelona,Spain,2004:Article No.22.
    [23]AKBIK A,L?SER A.KrakeN:N-ary facts in open information extraction[C]//Proceedings of the Joint Workshop on Automatic Knowledge Base Construction and Web-Scale Knowledge Extraction.Montreal,Canada,2012:52-56.
    [24]CHRISTENSEN J,MAUSAM,SODERLAND S,et al.An analysis of open information extraction based on semantic role labeling[C]//Proceedings of the Sixth International Conference on Knowledge Capture.Banff,Alberta,Canada,2011:113-120.
    [25]HOFFART J,SUCHANEK F M,BERBERICH K,et al.YAGO2:a spatially and temporally enhanced knowledge base from Wikipedia[J].Artificial intelligence,2013,194:28-61.
    [26]LING Xiao,WELD D S.Temporal information extraction[C]//Proceedings of the Twenty-Fourth AAAI Conference on Artificial Intelligence.Atlanta,Georgia,USA,2010:1385-1390.
    [27]WEIKUM G,NTARMOS N,SPANIOL M,et al.Longitudinal analytics on web archive data:it's about time![C]//Proceedings of the Fifth Biennial Conference on Innovative Data Systems Research.Asilomar,USA,2011:199-202.
    [28]YADAV R,TANDAN S R.N-ary relation approach for open domain question answering system based on information extraction through world wide web[J].International journal of engineering and applied sciences(IJEAS),2015,2(5):141-144.
    [29]BERRAHOU S L,BUCHE P,DIBIE J,et al.Xart:discovery of correlated arguments of n-ary relations in text[J].Expert systems with applications,2017,73:115-124.
    [30]秦兵,刘安安,刘挺.无指导的中文开放式实体关系抽取[J].计算机研究与发展,2015,52(5):1029-1035.QIN Bing,LIU Anan,LIU Ting.Unsupervised chinese open entity relation extraction[J].Journal of computer research and development,2015,52(5):1029-1035.
    [31]赵军,刘康,周光有,等.开放式文本信息抽取[J].中文信息学报,2011,25(6):98-110.ZHAO Jun,LIU Kang,ZHOU Guangyou,et al.Open information extraction[J].Journal of Chinese information processing,2011,25(6):98-110.
    [32]王岁花,赵爱玲,马巍巍.从Web中提取中文本体非分类关系的方法[J].计算机工程与设计,2010,31(2):451-454.WANG Suihua,ZHAO Ailing,MA Weiwei.Approach to extracting non-taxonomic relationships for Chinese ontology from web[J].Computer Engineering and Design,2010,31(2):451-454.
    [33]李明耀,杨静.基于依存分析的开放式中文实体关系抽取方法[J].计算机工程,2016,42(6):201-207.LI Mingyao,YANG Jing.Open Chinese entity relation extraction method based on dependency parsing[J].Computer engineering,2016,42(6):201-207.
    [34]古凌岚,孙素云.基于语义依存的中文本体非分类关系抽取方法[J].计算机工程与设计,2012,33(4):1676-1681.GU Linglan,SUN Suyun.Approach to Chinese ontology non-taxonomic relation extraction based on semantic dependency[J].Computer engineering and design,2012,33(4):1676-1681.
    [35]QIU Likun,ZHANG Yue.ZORE:a syntax-based system for Chinese open relation extraction[C]//Processing of Conference on Empirical Methods in Natural Language.Doha,Qatar,2014:1870-1880.
    [36]Che W,Li Z,Liu T.LTP:A Chinese Language Technology Platform[C]//Proceedings of the Coling 2010,Demonstrations.2010,Beijing,China,2010:13-16.
    [37]Zhang H,Yu H,Xiong D,et al.HHMM-based Chinese lexical analyzer ICTCLAS[J].SIGHAN'03 Proceedings of the second SIGHAN workshop on Chinese language processing,2003,17:184-187.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700