用户名: 密码: 验证码:
变体上下文窗口下的词向量准确性研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Research on word vector accuracy using variant context window
  • 作者:胡正 ; 杨志勇
  • 英文作者:HU Zheng;YANG Zhiyong;School of Software,Nanchang Hangkong University;
  • 关键词:词向量 ; 词嵌入 ; 上下文窗口 ; 自然语言处理 ; 神经网络 ; 深度学习
  • 英文关键词:word vector;;word embedding;;context window;;natural language processing;;neural network;;deep learning
  • 中文刊名:XDDJ
  • 英文刊名:Modern Electronics Technique
  • 机构:南昌航空大学软件学院;
  • 出版日期:2019-03-13 07:02
  • 出版单位:现代电子技术
  • 年:2019
  • 期:v.42;No.533
  • 基金:国家自然科学基金资助项目(61501218)~~
  • 语种:中文;
  • 页:XDDJ201906036
  • 页数:4
  • CN:06
  • ISSN:61-1224/TN
  • 分类号:154-156+161
摘要
词向量的准确性在较大程度上影响了这些自然语言处理任务的运行。词向量通过词嵌入产生,在词嵌入的方法中,都将目标单词及其上下文作为训练的输入,因此上下文的选定对词嵌入有着重要的影响。文中通过使用word2vec词嵌入方法,研究各种变体上下文窗口对词嵌入准确度的影响。根据上下文窗口的各种宽度、偏移量、权值进行了一系列实验。从实验结果中发现,上下文窗口的变化只会对整体训练结果的准确性造成很小的影响,然而对于其中具体的各个单词却有显著影响。从而得出结论,即大量单词各自所适应的上下文窗口区别较大,而统一的上下文窗口难以实现对全部单词的最佳训练。
        The word vector accuracy affects the operation of natural language processing tasks considerably. Word vectors are generated by the means of word embedding. In word embedding methods,the target words and their contexts are treated as inputs of the training. As a result,context determination has an important influence on word embedding. Therefore,the influ-ence of variant context windows on word embedding accuracy is studied by using the word2vec word embedding method in this paper. A series of experiments were carried out according to the context windows with variant widths,offsets and weights. The ex-perimental results show that,the variations of the context windows do not have a significant effect on the overall accuracy of training results,but have a significant effect on various specific words,so it is concluded that quite many words have their own demands in suitability of context windows,so it is difficult for a unified context window to implement the optimal training for all words.
引文
[1]BENGIO Y,DUCHARME R,VINCENT P,et al.A neural probabilistic language model[J].Journal of machine learning research,2003,3:1137-1155.
    [2]SOCHER R,BAUER J,MANNING C D,et al.Parsing with compositional vector grammars[C]//Proceedings of 51st Annual Meeting of the Association for Computational Linguistics.[S.l.:s.n.],2013:455-465.
    [3]SOCHER R,PERELYGIN A,WU J Y,et al.Recursive deep models for semantic compositionality over a sentiment treebank[J/OL].[2017-03-13].https://nlp.stanford.edu/~socherr/EMN-LP2013_RNTN.pdf.
    [4]SIEN?NIK S K.Adapting word2vec to named entity recognition[C]//Proceedings of the 20th Nordic Conference of Computational Linguistics.Vilnius:Link?ping University Electronic Press,2015:239-243.
    [5]MIKOLOV T,SUTSKEVER I,CHEN K,et al.Distributed representations of words and phrases and their compositionality[J].Advances in neural information processing systems,2013,26:3111-3119.
    [6]BARKAN O.Bayesian neural word embedding[J/OL].[2016-03-21].https://arxiv.org/ftp/arxiv/papers/1603/1603.06571.pdf.
    [7]LéBRET R,COLLOBERT R.Word embeddings through Hellinger PCA[J/OL].[2017-01-04].https://arxiv.org/pdf/1312.5542.pdf.
    [8]LEVY O,GOLDBERG Y.Neural word embedding as implicit matrix factorization[J].Advances in neural information processing systems,2014,3:2177-2185.
    [9]LI Y T,XU L L,TIAN F,et al.Word embedding revisited:a new representation learning and explicit matrix factorization perspective[C]//Proceedings of 24th International Conference on Artificial Intelligence.Buenos Aires:AAAI Press,2015:3650-3656.
    [10]GLOBERSON A,CHECHIK G,PEREIRA F,et al.Euclidean embedding of co-occurrence data[J].Journal of machine learning research,2007,8(4):2265-2295.
    [11]LEVY O,GOLDBERG Y.Linguistic regularities in sparse and explicit word representations[C]//Proceedings of Eighteenth Conference on Computational Natural Language Learning.[S.l.:s.n.],2014:171-180.
    [12]MIKOLOV T,CHEN K,CORRADO G,et al.Efficient estimation of word representations in vector space[J/OL].[2013-09-07].https://arxiv.org/pdf/1301.3781.pdf.
    [13]ZHILA A,YIH W,MEEK C,et al.Combining heterogeneous models for measuring relational similarity[C]//Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.[S.l.:s.n.],2013:1000-1009.
    [14]MIKOLOV T,YIH W T,ZWEIG G.Linguistic regularities in continuous space word representations[C]//Proceedings of Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Atlanta:Association for Computational Linguistics,2013:746-751.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700