Diachronic Deviation Features in Continuous Space Word Representations
详细信息    查看全文
  • 作者:Ni Sun (21) (22)
    Tongfei Chen (21)
    Liumingjing Xiao (21)
    Junfeng Hu (21) (22)
  • 关键词:Lexical semantics ; diachronic corpora ; semantic distribution ; hot topics
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2014
  • 出版时间:2014
  • 年:2014
  • 卷:8801
  • 期:1
  • 页码:23-33
  • 全文大小:655 KB
  • 参考文献:1. Bengio, Y., Ducharme, R., Vincent, P., Jauvin, C.: A neural probabilistic language model. JMLR聽3, 1137鈥?155 (2003)
    2. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. JMLR聽3, 993鈥?022 (2003)
    3. Collobert, R., Weston, J., Bottou, L., Karlen, M., Kavukcuoglu, K., Kuksa, P.: Natural language processing (almost) from scratch. JMLR聽12, 2493鈥?537 (2011)
    4. He, S., Zou, X., Xiao, L., Hu, J.: Construction of diachronic ontologies from people鈥檚 daily of fifty years. In: LREC (2014)
    5. Kleinberg, J.M.: Hubs, authorities, and communities. ACM Computing Surveys聽31(4es), 5 (1999) CrossRef
    6. Michel, J.B., Shen, Y.K., Aiden, A.P., Veres, A., Gray, M.K., Pickett, J.P., Hoiberg, D., Clancy, D., Norvig, P., Orwant, J., et al.: Quantitative analysis of culture using millions of digitized books. Science聽331(6014), 176鈥?82 (2011) CrossRef
    7. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
    8. Mikolov, T., Le, Q.V., Sutskever, I.: Exploiting similarities among languages for machine translation. arXiv preprint arXiv:1309.4168 (2013)
    9. Zhang, H.P., Yu, H.K., Xiong, D.Y., Liu, Q.: Hhmm-based chinese lexical analyzer ictclas. In: SIGHAN, pp. 184鈥?87 (2003)
  • 作者单位:Ni Sun (21) (22)
    Tongfei Chen (21)
    Liumingjing Xiao (21)
    Junfeng Hu (21) (22)

    21. School of Electronics Engineering & Computer Science, Peking University, Beijing, P.R. China
    22. Key Laboratory of Computational Linguistics (Ministry of Education), P.R. China
  • ISSN:1611-3349
文摘
In distributed word representation, each word is represented as a unique point in the vector space. This paper extends this to a diachronic setting, where multiple word embeddings are generated with corpora in different time periods. These multiple embeddings can be mapped to a single target space via a linear transformation. In this target space each word is thus represented as a distribution. The deviation features of this distribution can reflect the semantic variation of words through different time periods. Experiments show that word groups with similar deviation features can indicate the hot topics in different ages. And the frequency change of these word groups can be used to detect the age of peak celebrity of the topics in the history.
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.