基于HowNet的语义表示学习
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Semantic Representation Learning Based on HowNet
  • 作者:朱靖雯 ; 杨玉基 ; 许斌 ; 李涓子
  • 英文作者:ZHU Jingwen;YANG Yuji;XU Bin;LI Juanzi;School of Information Management,Beijing Information Science and Technology University;Knowledge Engineering Group,Department of Computer Science and Technology,Tsinghua University;
  • 关键词:HowNet ; 知识图谱 ; 语义表示 ; 表示学习
  • 英文关键词:HowNet;;knowledge graph;;semantic representation;;representation learning
  • 中文刊名:MESS
  • 英文刊名:Journal of Chinese Information Processing
  • 机构:北京信息科技大学信息管理学院;清华大学计算机系知识工程实验室;
  • 出版日期:2019-03-15
  • 出版单位:中文信息学报
  • 年:2019
  • 期:v.33
  • 基金:国家高技术研究发展计划(863)(2015AA015401);; 国家科技部重点研发计划(2018YFB100283)
  • 语种:中文;
  • 页:MESS201903005
  • 页数:9
  • CN:03
  • ISSN:11-2325/N
  • 分类号:38-46
摘要
HowNet是一个大规模高质量的跨语言(中英)常识知识库,蕴含着丰富的语义信息。该文利用知识图谱领域的方法将HowNet复杂的结构层层拆解,得到了知识图谱形式的HownetGraph,进而利用网络表示学习以及知识表示学习方法得到了跨语言(中、英)、跨语义单位(字词、义项①、DEF_CONCEPT②和义原)的向量表示,在词语相似度(word similarity)和词语类比(word analogy)任务上对中英文数据集进行了实验,实验结果显示该文提出的方法在词语语义相似度的任务上取得了最好效果。
        HowNet is a large-scale and high-quality cross-lingual commonsense knowledge base,containing a wealth of semantic information.This paper disassembles HowNets complex structure and obtains HownetGraph in the form of knowledge graph.Then Network Representation Learning and Knowledge Representation Learning methods are applied to obtain cross-lingual vector representation of different semantic units,i.e.,word,sense,DEF_CONCEPT and sememe.Two series of experiments(word similarity and word analogy)are conducted on Chinese and English datasets,and the results show the proposed method achieves the best results.
引文
[1]董振东,董强.知网和汉语研究[J].当代语言学,2001,3(1):33-44.
    [2]Niu Y,Xie R,Liu Z,et al.Improved word representation learning with sememes[C]//Proceedings of the55th Annual Meeting of the Association for Computational Linguistics.2017(1):2049-2058.
    [3]刘群,李素建.基于《知网》的词汇语义相似度计算[J].中文计算语言学,2002,7(2):59-76.
    [4]梅立军,周强,臧路,等.知网与同义词词林的信息融合研究[J].中文信息学报,2005,19(1):64-71.
    [5]孙景广,蔡东风,吕德新,等.基于知网的中文问题自动分类[J].中文信息学报,2007,21(1):90-95.
    [6]Yan J,Bracewell D B,Ren F,et al.The creation of a Chinese emotion ontology based on HowNet[J].Engineering Letters,2008,16(1):166-171.
    [7]唐怡,周昌乐,练睿婷.基于HowNet的中文语义依存分析[J].心智与计算,2010(2):109-116.
    [8]Liu J,Xu J,Zhang Y.An approach of hybrid hierarchical structure for word similarity computing by HowNet[C]//Proceedings of the 6th International Joint Conference on Natural Language Processing,2013:927-931.
    [9]向春丞,穗志方,詹卫东.HowNet与CCD映射方法研究[J].中文信息学报,2015,29(3):44-51.
    [10]Zeng X,Yang C,Tu C,et al.Chinese LIWC lexicon Expansion via Hierarchical classification of word embeddings with sememe Attention[C]//Proceedings of AAAI 2018,2018.
    [11]Perozzi B,Al-Rfou R,Skiena S.DeepWalk:Online learning of social representations[C]//Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2014:701-710.
    [12]Tang J,Qu M,Wang M,et al.LINE:Large-scale information network embedding[C]//Proceedings of the 24th International Conference on World Wide Web.International World Wide Web Conferences Steering Committee,2015:1067-1077.
    [13]Grover A,Leskovec J.node2vec:Scalable feature learning for networks[C]//Proceedings of the 22nd ACMSIGKDD International Conference on Knowledge Discovery and Data Mining.ACM,2016:855-864.
    [14]Cao S,Lu W,Xu Q.Grarep:Learning graph representations with global structural information[C]//Proceedings of the 24th ACM International Conference on Information and Knowledge Management.ACM,2015:891-900.
    [15]Kipf T N,Welling M.Semi-supervised classification with graph convolutional networks[J].arXiv preprint arXiv:1609.02907,2016.
    [16]Yang C,Liu Z,Zhao D,et al.Network representation learning with rich text information[C]//Proceedings of the 24th IJCAI.,2015:2111-2117.
    [17]Tu C,Liu H,Liu Z,et al.Cane:Context-aware network embedding for relation modeling[C]//Proceedings of the 55th Annual Meeting of the Association for Computational Linguistics,2017,1:1722-1731.
    [18]Bordes A,Usunier N,Garcia-Duran A,et al.Translating embeddings for modeling multi-relational data[C]//Proceedings of the 27th ALL on Neural Information Processing Systems,2013:2787-2795.
    [19]Wang Z,Zhang J,Feng J,et al.Knowledge gGraph embedding by translating on hyperplanes[C]//Proceedings of the 14th AAAI conference on Artifical Intelligence,2014(14):1112-1119.
    [20]Lin Y,Liu Z,Sun M,et al.Learning entity and relation embeddings for knowledge graph completion[C]//Proceedings of the 29th AAAI Conference on Artifical Intelligence 2015(15):2181-2187.
    [21]Ji G,He S,Xu L,et al.Knowledge graph embedding via dynamic mapping matrix[C]//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics and the 7th International Joint Conference on Natural Language Processing,2015,1:687-696.
    [22]Ji G,Liu K,He S,et al.Knowledge graph completion with adaptive sparse transfer matrix[C]//Proceedings of the 30th AAAI Conference on Artifical Intelligence,2016:985-991.
    [23]Xiao H,Huang M,Hao Y,et al.TransG:A generative mixture model for knowledge graph embedding[J].arXiv preprint arXiv:1509.05488,2015.
    [24]He S,Liu K,Ji G,et al.Learning to represent knowledge graphs with gaussian embedding[C]//Proceedings of the 24th ACM International Conference on Information and Knowledge Management.ACM,2015:623-632.
    [25]Chen X,Xu L,Liu Z,et al.Joint learning of character and word embeddings[C]//Proceedings of IJCAI,2015:1236-1242.
    [26]Neelakantan A,Shankar J,Passos A,et al.Efficient non-parametric estimation of multiple embeddings per word in vector space[J].arXiv preprint arXiv:1504.06654,2015.
    [27]Xie R,Yuan X,Liu Z,et al.Lexical sememe prediction via word embeddings and matrix factorization[C]//Proceedings of the 26th International Joint Conference on Artificial Intelligence.AAAI Press,2017:4200-4206.
    (1)HowNet中称“义项”为“概念”,为了避免和知识图谱中的“概念”混淆,本文均用“义项”表示。
    (2)HownetGraph中的DEF单元,是介于义项和义原之间的语义单位,详见2.2.2节。
    (1)2012年版本。
    (1)https://github.com/Leonard-Xu/CWE/tree/master/data
    (2)http://www.cs.technion.ac.il/~gabr/resources/data/wordsim353/
    (3)https://github.com/thunlp/KB2E
    (4)https://github.com/thunlp/openne

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700