基于通用知识库的地理实体开放关系过滤方法

设为首页

收藏本站

网站地图 | English | 公务邮箱

远程访问

NSTL服务站

基于通用知识库的地理实体开放关系过滤方法

详细信息查看全文 | 推荐本文 |

英文篇名：A Knowledge-based Method for Filtering Geo-entity Relations
作者：高嘉良 ; 余丽 ; 仇培元 ; 陆锋
英文作者：GAO Jialiang;YU Li;QIU Peiyuan;LU Feng;State Key Laboratory of Resources and Environmental Information System,Institute of Geographic Sciences and Natural Resources Research,Chinese Academy of Sciences;University of Chinese Academy of Sciences;National Science Library,Chinese Academy of Sciences;Jiangsu Center for Collaborative Innovation in Geographical Information Resource Development and Application;
关键词：文本数据 ; 地理实体关系抽取 ; 地理知识图谱构建 ; 通用知识库 ; 开放关系抽取 ; 地理信息质量评价 ; 信息过滤
英文关键词：text data;;geo-entity relations extraction;;geo-KG building;;common knowledge bases;;open relation extraction;;evaluation of geographic information quality;;information filtering
中文刊名：地球信息科学学报
英文刊名：Journal of Geo-information Science
机构：中国科学院地理科学与资源研究所资源与环境信息系统国家重点实验室;中国科学院大学;中国科学院文献情报中心;江苏省地理信息资源开发与利用协同创新中心;
出版日期：2019-09-25
出版单位：地球信息科学学报
年：2019
期：09
基金：国家自然科学基金重点项目(41631177)~~
语种：中文;
页：90-99
页数：10
CN：11-5809/P
ISSN：1560-8999
分类号：P208;TP391.1

摘要

文本数据为地理知识服务提供了海量资源。面向文本数据的地理实体关系抽取是地理知识图谱构建的核心技术,直接影响地理知识推理与服务的质量。由于文本数据不可避免地含有噪声,从文本中抽取的地理实体关系需要质量评价和信息过滤。本文提出一种基于通用知识库的地理实体关系过滤方法,针对已抽取的地理实体关系从中筛选出高质量的结果:先利用"本体知识"、"事实知识"和"同义词知识"构建地理关系知识库,作为信息过滤的参照数据;再基于分布式向量表示模型度量已抽取的地理实体关系与参照数据之间的语义相似性,以提高地理知识图谱的丰度与鲜度。实验结果表明,相比业界流行的"Stanford OpenIE"工具,本文所提出的方法可将置信度区间[0, 0.2]和[0.8, 1]的MSE(Mean Square Error)从59.27%降至3.94%,AUC(Area Under the ROC Curve)从0.51提升至0.89。
Knowledge Graphs(KGs) are crucial resources for supporting geographical knowledge services.Given the vast geographical knowledge in web text, extraction of geo-entity relations from web text has become the core technology for constructing geographical KGs. Furthermore, it directly affects the quality of geographical knowledge services. However, web text inevitably contains noise and geographical knowledge can be sparsely distributed, both greatly restricting the quality of geo-entity relationship extraction. Here, we proposed a method for filtering geo-entity relations based on existing Knowledge Bases(KBs). Specifically,ontology knowledge, fact knowledge, and synonym knowledge were integrated to generate geo-related knowledge. Then, the extracted geo-entity relationships and the geo-related knowledge were transferred into vectors, and the maximum similarity between vectors was the confidence value of one extracted geo-entity relationship triple. Our method takes full advantage of existing KBs to assess the quality of geographical information in web text, which helps improve the richness and freshness of geographical KGs. Compared with the Stanford OpenIE method, our method decreased the Mean Square Error(MSE) from 0.62 to 0.06 in the confidence interval [0.7, 1], and improved the area under the Receiver Operating Characteristic(ROC) Curve(AUC) from 0.51 to 0.89.

引文

[1] Bollacker K, Evans C, Paritosh P, et al. Freebase:A collaboratively created graph database for structuring human knowledge[C]. Proceedings of the 2008 ACM SIGMOD international conference on Management of data. ACM,2008:1247-1250.
    [2] Fundel K, Küffner R, Zimmer R, RelEx-Relation extraction using dependency parse trees[J]. Bioinformatics,2006,23(3):365-371.
    [3] Zeng D, et al. Relation classification via convolutional deep neural network[C]. Proceedings of COLING 2014,the 25th International Conference on Computational Linguistics:Technical Papers, 2014.
    [4] Yuan Y H, Liu Y, Wei G X. Exploring inter-country connection in mass media:A case study of China[J]. Computers Environment and Urban Systems, 2017,62:86-96.
    [5] Montello D R, Friedman A, Phillips D W. Vague cognitive regions in geography and geographic information science[J]. International Journal of Geographical Information Science, 2014,28(9):1802-1820.
    [6] Purves R S, et al. The design and implementation of SPIRIT:A spatially aware search engine for information retrieval on the internet[J]. International Journal of Geographical Information Science, 2007,21(7):717-745.
    [7] Fader A, Soderland S, Etzioni O. Identifying relations for open information extraction[C]. Proceedings of the conference on empirical methods in natural language processing. Association for Computational Linguistics, 2011:1535-1545.
    [8] Del Corro L, Gemulla R. Clausie:Clause-based open information extraction[C]. Proceedings of the 22ndinternational conference on World Wide Web. ACM, 2013:355-366.
    [9] Schmitz M, Bart R, Soderland S, et al. Open language learning for information extraction[C]. Proceedings of the2012 Joint Conference on Empirical Methods in Natural Language Processing and Computational Natural Language Learning. Association for Computational Linguistics, 2012:523-534.
    [10] Angeli G, Premkumar M J J, Manning C D. Leveraging linguistic structure for open domain information extraction[C]. Proceedings of the 53rdAnnual Meeting of the Association for Computational Linguistics and the 7thInternational Joint Conference on Natural Language Processing, 2015,1:344-354.
    [11] Pal H. Demonyms and compound relational nouns in nominal open IE[C]. Proceedings of the 5thWorkshop on Automated Knowledge Base Construction, 2016:35-39.
    [12] Saha S. Open information extraction from conjunctive sentences[C]. Proceedings of the 27thInternational Conference on Computational Linguistics, 2018:2288-2299.
    [13]余丽,陆锋,张恒才.网络文本蕴涵地理信息抽取:研究进展与展望[J].地球信息科学学报,2015,17(2):127-134.[Yu L, Lu F, Zhang H C. Extracting geographic information from web texts:Status and development[J]. Journalof Geo-information Science, 2015,17(2):127-134.]
    [14] Castillo C, Mendoza M, Poblete B. Information credibility on twitter[C]. Proceedings of the 20thinternational conference on World wide web. ACM, 2011:675-684.
    [15] AlrubaianM, Member S, IEEE, et al. A credibility analysis system for assessing information on Twitter[J]. IEEE Transactions on Dependable and Secure Computing,2018,15(4):661-674.
    [16]蒋盛益,陈东沂,庞观松,等.微博信息可信度分析研究综述[J].图书情报工作,2013,57(12):136-142.[Jiang S Y,Chen T Y, Pang G S, et al. Research review of information credibility analysis on Microblog[J]. Libray and Information Service, 2013,57(12):136-142.]
    [17] Ellis J, et al. Linguistic resources for 2013 Knowledge base population evaluations[C]//in TAC, 2012.
    [18] CohenJ,Acoefficientofagreementfornominalscales[J].Educationalpsychologicalmeasurement,1960,20(1):37-46.
    [19] Lu, L.Y. and T. Zhou, Link prediction in complex networks:A survey[J]. Physica a-Statistical Mechanics and Its Applications, 2011,390(6):1150-1170.
    [20] Dong X, Gabrilovich E, Heitz G, et al. Knowledge vault:A web-scale approach to probabilistic knowledge fusion[C]. Proceedings of the 20th ACM SIGKDD international conference on Knowledge discovery and data mining.ACM, 2014:601-610.
    [21] Quinlan J R, Cameron-Jones R M. Foil:A midterm report[C]. European conference on machine learning. Springer,Berlin, Heidelberg, 1993:1-20.
    [22] Lao N, Mitchell T, Cohen W W. Random walk inference and learning in a large scale knowledge base[C]. Proceedings of the Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 2011:529-539.
    [23] Galárraga L A, Teflioudi C, Hose K, et al. AMIE:Association rule mining under incomplete evidence in ontological knowledge bases[C]. Proceedings of the 22ndinternational conference on World Wide Web. ACM, 2013:413-422.
    [24] Richardson M P. Domingos. Markov logic networks[J].Machine learning, 2006,62(1-2):107-136.
    [25] Huang B, Kimmig A, GetoorL, et al. Probabilistic soft logic for trust analysis in social networks[C]. International Workshop on Statistical Relational Artificial Intelligence(StaRAI 2012), 2012.
    [26] Mikolov T, Sutskever I, Chen K, et al. Distributed representations of words and phrases and their compositionality[C]. Advances in neural information processing systems, 2013:3111-3119.
    [27] Wang Z, Zhang J, Feng J, et al. Knowledge graph embedding by translating on hyperplanes[C]. Twenty-Eighth AAAI conference on artificial intelligence, 2014.
    [28] Krompa?D, Nickel M, Jiang X, et al. Non-negative tensor factorization with rescal[C]. Tensor Methods for Machine Learning, ECML workshop, 2013.
    [29] Li F, Dong X L, Langen A, et al. Knowledge verification for long-tail verticals[J]. Proceedings of the VLDB Endowment, 2017,10(11):1370-1381.
    [30]徐增林,盛泳潘,贺丽荣,等,知识图谱技术综述[J].电子科技大学学报,2016,45(4):589-606.[Xu Z L, Sheng Y P,He L R, et al. Review on knowledge graph techniques[J].Journal of University of Electronic Science and Technology of China, 2016,45(4):589-606.]
    [31] Miller G A. Wordnet-a Lexical Database for English[J].Communications of the Acm, 1995,38(11):39-41.
    [32] Egenhofer M J. A formal definition of binary topological relationships[C]. International conference on foundations of data organization and algorithms. Berlin:Springer,1989:457-472.
    [33] Schnabel T, Labutov I, Mimno D, et al. Evaluation methods for unsupervised word embeddings[C]. Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing, 2015:298-307.
    [34] Dong X L. Knowledge-Based Trust:Estimating the trustworthiness of web sources[J]. Proceedings of the Vldb Endowment, 2015,8(9):938-949.
    [35] Moon T K. The expectation-maximization algorithm[J].Ieee Signal Processing Magazine, 1996,13(6):47-60.
    [36] Li F. Knowledge verification for long-tail verticals[J].Proceedings of the VLDB Endowment, 2017,10(11):1370-1381.
    [37] Zhang H, Li Y, Ma F, et al. TextTruth:An unsupervised approach to discover trustworthy information from multisourced text data[C]. Proceedings of the 24thACM SIGKDD International Conference on Knowledge Discovery&Data Mining. ACM, 2018:2729-2737.
    [38] Li Y. A survey on truth discovery[C]. ACM Sigkdd Explorations Newsletter, 2016,17(2):1-16.
    [39]陆锋,余丽,仇培元.论地理知识图谱[J].地球信息科学学报,2017,19(6):723-734.[Lu F, Yu L, Qiu P Y. On geographic knowledge graph[J]. Journal of Geo-information Science, 2017,19(6):723-734.]

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700