用户名: 密码: 验证码:
基于门控联合池化自编码器的通用性文本表征
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Gated Mean-Max Autoencoder for Text Representations
  • 作者:张明华 ; 吴云芳 ; 李伟康 ; 张仰森
  • 英文作者:ZHANG Minghua;WU Yunfang;LI Weikang;ZHANG Yangsen;MOE Key Laboratory of Computational Linguistics,Peking University;Computer School,Beijing Information Science and Technology University;
  • 关键词:文本表征 ; 自编码器 ; 多头自注意力机制
  • 英文关键词:text representations;;autoencoder;;multi-head self-attention mechanism
  • 中文刊名:MESS
  • 英文刊名:Journal of Chinese Information Processing
  • 机构:北京大学计算语言学教育部重点实验室;北京信息科技大学计算机学院;
  • 出版日期:2019-03-15
  • 出版单位:中文信息学报
  • 年:2019
  • 期:v.33
  • 基金:国家自然科学基金(61773026,61772081)
  • 语种:中文;
  • 页:MESS201903004
  • 页数:8
  • CN:03
  • ISSN:11-2325/N
  • 分类号:30-37
摘要
为了学习文本的语义表征,以往的研究者主要依赖于复杂的循环神经网络(recurrent neural networks,RNNs)和监督式学习方法。该文提出了一种门控联合池化自编码器(gated mean-max AAE)用于学习中英文的文本语义表征。该文的自编码器完全通过多头自注意力机制(multi-head self-attention mechanism)来构建编码器和解码器网络。在编码阶段,提出了均值—最大化(mean-max)联合表征策略,即同时运用平均池化(mean pooling)和最大池化(max pooling)操作来捕获输入文本中多样性的语义信息。为促使联合池化表征可以全面地指导重构过程,解码器采用门控操作进行动态关注。通过在大规模中英文未标注语料上训练模型,获得了高质量的句子编码器。在重构文本段落的实验中,该文模型在实验效果和计算效率上均超越了传统的RNNs模型。将公开训练好的文本编码器,使其可以方便地运用于后续的研究。
        In order to learn distributed representations of text sequences,the previous methods focus on complex recurrent neural networks or supervised learning.In this paper,we propose a gated mean-max autoencoder both for Chinese and English text representations.In our model,we simply rely on the multi-head self-attention mechanism to construct the encoder and decoder.In the encoding we propose a mean-max strategy that applies both mean and max pooling operations over the hidden vectors to capture diverse information of the input.To enable the information to steer the reconstruction process,the decoder employ element-wise gate to select between mean and max representations dynamically.By training our model on a large amount of Chinese and English un-labelled data respectively,we obtain high-quality text encoders for publicl available.Experimental results of reconstructing coherent long texts from the encoded representations demonstrate the superiority of our model over the traditional recurrent neural network,in terms of both performance and complexity.
引文
[1]Mikolov T,Chen K,Corrado G,et al.Efficient estimation of word representations in vector space[J].arXiv preprint arXiv:1301.3781,2013.
    [2]Kiros R,Zhu Y,Salakhutdinov R R,et al.Skipthought vectors[J]arXiv preprint arXiv:1506.06726,2015.
    [3]Ba J L,Kiros J R,Hinton G E.Layer normalization[J].arXiv preprint arXiv:1607.06450,2016.
    [4]Hill F,Cho K,Korhonen A.Learning distributed representations of sentences from unlabelled data[J].arXiv preprint arXiv:1602.03483,2016.
    [5]Gan Z,Pu Y,Henao R,et al.Learning generic sentence representations using convolutional neural networks[C]//Proceedings of the 2017 Conference on Empirical Methods in Natural Language Processing,2017:2390-2400.
    [6]Conneau A,Kiela D,Schwenk H,et al.Supervised learning of universal sentence representations from natural language inference data[J].arXiv preprint arXiv:1705.02364,2017.
    [7]Cer D,Yang Y,Kong S,et al.Universal sentence encoder[J].arXiv preprint arXiv:1803.11175,2018.
    [8]Bowman S R,Angeli G,Potts C,et al.A large annotated corpus for learning natural language inference[J].arXiv preprint arXiv:1508.05326,2015.
    [9]Li J,Luong M T,Jurafsky D.A hierarchical neural autoencoder for paragraphs and documents[J].arXiv preprint arXiv:1506.01057,2015.
    [10]Le Q,Mikolov T.Distributed representations of sentences and documents[C]//Proceedings of the 31st International Conference on Machine Learning,2014:1188-1196.
    [11]Mikolov T,Sutskever I,Chen K,et al.Distributed representations of words and phrases and their compositionality[J].arXiv preprint arXiv:1310.4546,2013.
    [12]Arora S,Liang Y,Ma T.A simple but tough-to-beat baseline for sentence embeddings[C]//Proceedings of the 5th International Conforence on Learning Representations.2016.
    [13]Henderson M,Al-Rfou R,Strope B,et al.Efficient natural language response suggestion for smart reply[J].arXiv preprint arXiv:1705.00652,2017.
    [14]Bahdanau D,Cho K,Bengio Y.Neural machine translation by jointly learning to align and translate[J].arXiv preprint arXiv:1409.0473,2014.
    [15]Vaswani A,Shazeer N,Parmar N,et al.Attention is all you need[J].arXiv prepint arXiv:1706:03762,2017.
    [16]Manning C,Surdeanu M,Bauer J,et al.The Stanford CoreNLP natural language processing toolkit[C]//Proceedings of 52nd Annual Meeting of the Association for Computational Linguistics:System Demonstrations,2014:55-60.
    [17]Kingma D P,Ba J.Adam:A method for stochastic optimization[J].arXiv preprint arXiv:1412.6980,2014.
    [18]Glorot X,Bengio Y.Understanding the difficulty of training deep feedforward neural networks[C]//Proceedings of the 13th International Conference on Artificial Intelligence and Statistics,2010:249-256.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700