分布式单词表示综述

英文篇名：A Survey on Distributed Word Representation
作者：孙飞 ; 郭嘉丰 ; 兰艳艳 ; 徐君 ; 程学旗
英文作者：SUN Fei;GUO Jia-Feng;LAN Yan-Yan;XU Jun;CHENG Xue-Qi;CAS Key Lab of Network Data Science and Technology;Institute of Computing Technology,Chinese Academy of Sciences;University of Chinese Academy of Sciences;
关键词：单词表示 ; 分布式表示 ; 分布式单词表示 ; 表示学习 ; 深度学习
英文关键词：word representation;;distributed representation;;distributed word representation;;representation learning;;deep learning
中文刊名：JSJX
英文刊名：Chinese Journal of Computers
机构：中国科学院网络数据科学与技术重点实验室;中国科学院计算技术研究所;中国科学院大学;
出版日期：2016-09-26 13:55
出版单位：计算机学报
年：2019
期：v.42;No.439
基金：国家“九七三”重点基础研究发展规划项目基金(2014CB340401,2013CB329606);; 国家自然科学基金(61232010,61472401,61425016,61203298);; 中国科学院青年创新促进会(20144310,2016102)资助~~
语种：中文;
页：JSJX201907010
页数：21
CN：07
ISSN：11-1826/TP
分类号：169-189

摘要

单词表示作为自然语言处理的基本问题,一直广受关注.传统的独热表示丢失了单词间的语义关联,因而在实际使用中易受数据稀疏问题困扰.而分布式表示通过将单词表示为低维稠密实数向量,捕捉单词间的关联信息.该表示方式可在低维空间中高效计算单词间的语义关联,有效解决数据稀疏问题.作为神经网络模型的基本输入,单词分布式表示伴随着深度学习被广泛应用于自然语言处理领域的方方面面.从早期的隐式语义分析,到最近的神经网络模型,研究人员提出了各种各样的模型来学习单词的分布式表示.本文梳理了单词分布式表示学习的发展脉络,并从模型利用上下文入手,将这些模型统一在分布语义假设框架下,它们的区别只在于建模了单词不同的上下文.以隐式语义分析为代表的话题模型,利用文档作为上下文,建模了单词间的横向组合关系;以神经网络语言模型为代表的工作,则利用单词周围单词作为上下文,建模了单词间的纵向聚合关系.此外,本文还总结了单词分布式表示目前面临的主要挑战,包括多义词的表示、稀缺单词表示学习、细粒度语义建模、单词表示的解释性以及单词表示的评价,并介绍了最新的已有解决方案.最后,本文展望了单词表示未来的发展方向与前景.
As a fundamental problem in natural language processing,word representation is always widely concerned by the society.Traditional one-hot representations suffer from the data sparsity in practice due to missing semantic relation between words.Different from the one-hot representations,distributed word representations encode the semantic meaning of words as dense,realvalued vectors in a low-dimensional space.As a result,the distributed word representations can alleviate the data sparsity isses.As the inputs of neural network models,distributed word representations have been widely used in natural language processing along with deep learning.From latent semantic indexing to neural language model,researchers have developed various methods to learn distributed word representations.In this paper,we comb the development of models for learning distributed word representations.Furthermore,we find that all these models are built on the distributional hypothesis but with different contexts.From this perspective,we can group these models into two classes,syntagmatic and paradigmatic.Models like latent semantic indexing using documents as the contexts for words to capture the syntagmatic relations between words.While,models like neural language models capture the paradigmatic relations between words by the contexts surrounding the words.Then,we summarize the key challenges and the latest solutions,like representations for polysemous words and rare words,fine-grained semantic modeling,interpretability for distributed word representations,and evaluation for word representation.At last,we give a future outlook on the research and application directions.

引文

[1]Bengio Y,Courville A,Vincent P.Representation learning:A review and new perspectives.IEEE Transactions on Pattern Analysis and Machine Intelligence,2013,35(8):1798-1828
    [2]Manning C D.Computational linguistics and deep learning.Computational Linguistics,2015,41(4):701-707
    [3]Bengio Y,Ducharme R,Vincent P,et al.A neural probabilistic language model.Journal of Machine Learning Research,2003,3:1137-1155
    [4]Mikolov T,Chen K,Corrado G,et al.Efficient estimation of word representations in vector space//Proceedings of Workshop of ICLR.Scottsdale,USA,2013:1-12
    [5]Tang D,Wei F,Yang N,et al.Learning sentiment-specific word embedding for twitter sentiment classification//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics.Baltimore,Maryland,2014:1555-1565
    [6]Maas A L,Daly R E,Pham P T,et al.Learning word vectors for sentiment analysis//Proceedings of the 49th Annual Meeting of the Association for Computational Linguistics:Human Language Technologies.Portland,USA,2011:142-150
    [7]Socher R,Manning C D,Ng A Y.Learning continuous phrase representations and syntactic parsing with recursive neural networks//Proceedings of the Deep Learning and Unsupervised Feature Learning Workshop of NIPS 2010.Hyatt RegencyCanada,2010:1-9
    [8]Socher R,Bauer J,Manning C D,et al.Parsing with compositional vector grammars//Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics.Sofia,Bulgaria,2013:455-465
    [9]Collobert R,Weston J.A unified architecture for natural language processing:Deep neural networks with multitask learning//Proceedings of the 25th International Conference on Machine Learning.Helsinki,Finland,2008:160-167
    [10]Bahdanau D,Cho K,Bengio Y.Neural machine translation by jointly learning to align and translate.arXiv preprint arXiv:1409.0473,2014
    [11]Mikolov T,Le Q V,Sutskever I.Exploiting similarities among languages for machine translation.arXiv preprint arXiv:1309.4168,2013
    [12]Zou W Y,Socher R,Cer D,et al.Bilingual word embeddings for phrase-based machine translation//Proceedings of the2013Conference on Empirical Methods in Natural Language Processing.Seattle,USA,2013:1393-1398
    [13]Nguyen D Q,Billingsley R,Du L,et al.Improving topic models with latent feature word representations.Transactions of the Association for Computational Linguistics,2015,3:299-313
    [14]Das R,Zaheer M,Dyer C.Gaussian LDA for topic models with word embeddings//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics.Beijing,China,2015:795-804
    [15]Yang M,Cui T,Tu W.Ordering-sensitive and semanticaware topic modeling//Proceedings of the 29th AAAIConference on Artificial Intelligence.Austin,USA,2015:2353-2359
    [16]Manning C D,Raghavan P,Schütze H.Introduction to Information Retrieval.New York,USA:Cambridge University Press,2008
    [17]Croft B,Metzler D,Strohman T.Search Engines:Information Retrieval in Practice.USA:Addison-Wesley Publishing,2009
    [18]Hill F,Cho K,Korhonen A.Learning distributed representations of sentences from unlabelled data//Proceedings of the15th Annual Conference of the North American Chapter of the Association for Computational Linguistics.San Diego,California,2016:1367-1377
    [19]Nalisnick E,Mitra B,Craswell N,et al.Improving document ranking with dual word embeddings//Proceedings of the 25th International Conference Companion on World Wide Web.Republic and Canton of Geneva,Switzerland,2016:83-84
    [20]Hinton G E,McClelland J L,Rumelhart D E.Distributed representations//Rumelhart D E,McClell J L,PDP C eds.Research Group.Parallel Distributed Processing:Explorations in the Microstructure of Cognition,Vol.1.Cambridge,USA:MIT Press,1986:77-109
    [21]Pennington J,Socher R,Manning C D.Glove:Global vectors for word representation//Proceedings of the 2014Conference on Empirical Methods in Natural Language Processing.Doha,Qatar,2014:1532-1543
    [22]Collobert R,Weston J,Bottou L,et al.Natural language processing(almost)from scratch.Journal of Machine Learning Research,2011,12:2493-2537
    [23]Mnih A,Hinton G.Three new graphical models for statistical language modelling//Proceedings of the 24th International Conference on Machine Learning.Oregon,USA,2007:641-648
    [24]Hill F,Cho K,Korhonen A,et al.Learning to understand phrases by embedding the dictionary.Transactions of the Association for Computational Linguistics,2016,4:17-30
    [25]Cho K,van Merrienboer B,Gulcehre C,et al.Learning phrase representations using RNN encoder-Decoder for statistical machine translation//Proceedings of the 2014Conference on Empirical Methods in Natural Language Processing.Doha,Qatar,2014:1724-1734
    [26]Lebret R,Collobert R.The sum of its parts:Joint learning of word and phrase representations with autoencoders.ICMLDeep Learning Workshop.Lille,France,2015
    [27]Mikolov T,Sutskever I,Chen K,et al.Distributed representations of wrds and phrases and their compositionality//Proceedings of the Advances in Neural Information Processing Systems 26.Lake Tahoe,USA,2013:3111-3119
    [28]Socher R,Huval B,Manning C D,et al.Semantic compositionality through recursive matrix-vector spaces//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Stroudsburg,USA,2012:1201-1211
    [29]Paccanaro A,Hinton G E.Learning distributed representations of concepts using linear relational embedding.IEEETransactions on Knowledge and Data Engineering,2001,13(2):232-244
    [30]Hu B,Lu Z,Li H,et al.Convolutional neural network architectures for matching natural language sentences//Proceedings of the Advances in Neural Information Processing Systems 27.Montréal,Canada,2014:2042-2050
    [31]Kalchbrenner N,Grefenstette E,Blunsom P.A convolutional neural network for modelling sentences//Proceedings of the52nd Annual Meeting of the Association for Computational Linguistics.Baltimore,Maryland,2014:655-665
    [32]Le Q,Mikolov T.Distributed representations of sentences and documents//Proceedings of the 31st International Conference on Machine Learning.2014:1188-1196
    [33]Tang J,Qu M,Wang M,et al.LINE:Large-scale information network embedding//Proceedings of the 24th International Conference on World Wide Web.Florence,Italy,2015:1067-1077
    [34]Perozzi B,Al-Rfou R,Skiena S.DeepWalk:Online learning of social representations//Proceedings of the 20th ACMSIGKDD International Conference on Knowledge Discovery and Data Mining.New York,USA,2014:701-710
    [35]Miikkulainen R,Dyer M G.Natural language processing with modular PDP networks and distributed lexicon.Cognitive Science,1991,15:343-399
    [36]Rumelhart D E,McClelland J L,C.PDP Research Group.Parallel Distributed Processing:Explorations in the Microstructure of Cognition.Cambridge,MA,USA:MIT Press,1986
    [37]Bengio Y,Ducharme R,Vincent P.A neural probabilistic language model//Proceedings of the Advances in Neural Information Processing Systems 13.Vancouver,Canada,2001:932-938
    [38]Xu W,Rudnicky A.Can artificial neural networks learn language models?//Proceedings of the 6th International Conference on Spoken Language Processing.Beijing,China,2000:202-205
    [39]Hinton G E,Salakhutdinov R R.Reducing the dimensionality of data with neural networks.Science,2006,313(5786):504-507
    [40]Schwenk H,Gauvain J-L.Connectionist language modeling for large vocabulary continuous speech recognition//Proceedings of the IEEE International Conference on Acoustics,Speech,and Signal Processing.Orlando,USA,2002,1:765-768
    [41]Ponte J M,Croft W B.A language modeling approach to information retrieval//Proceedings of the 21st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.Melbourne,Australia,1998:275-281
    [42]Kneser R,Ney H.Improved backing-off for M-gram language modeling//Proceedings of the International Conference on Acoustics,Speech,and Signal Processing.Michigan,USA,1995,1:181-184
    [43]Brown P F,deSouza P V,Mercer R L,et al.Class-based N-gram models of natural language.Computational Linguistics,1992,18(4):467-479
    [44]Bengio Y,Sénécal J-S.Quick training of probabilistic neural nets by importance sampling//Proceedings of the Conference on Artificial Intelligence and Statistics.Key West,USA,2003
    [45]Mnih A,Teh Y W.A fast and simple algorithm for training neural probabilistic language models//Proceedings of the 29th International Conference on Machine Learning.Edinburgh,Scotland,2012:1751-1758
    [46]Mnih A,Kavukcuoglu K.Learning word embeddings efficiently with noise-contrastive estimation//Proceedings of Advances in Neural Information Processing Systems 26.Lake Tahoe,USA,2013:2265-2273
    [47]Gutmann M U,Hyvrinen A.Noise-contrastive estimation of unnormalized statistical models,with applications to natural image statistics.Journal of Machine Learning Research,2012,13(1):307-361
    [48]Morin F,Bengio Y.Hierarchical probabilistic neural network language model//Proceedings of the 10th International Workshop on Artificial Intelligence and Statistics.Barbados,2005:246-252
    [49]Goodman J.Classes for fast maximum entropy training//Proceedings of the IEEE International Conference on Acoustics,Speech,and Signal Processing.UT,USA,2001:561-564
    [50]Mnih A,Hinton G E.A scalable hierarchical distributed language//Proceedings of the Advances in Neural Information Processing Systems 21.Vancouver,Canada,2008:1081-1088
    [51]Mikolov T,Kopecky J,Burget L,et al.Neural network based language models for highly inflective languages//Proceedings of the IEEE International Conference on Acoustics,Speech and Signal Processing.Taipei,China,2009:4725-4728
    [52]Okanohara D,Tsujii J.A discriminative language model with pseudo-negative samples//Proceedings of the 45th Annual Meeting of the Association of Computational Linguistics.Prague,Czech Republic,2007:73-80
    [53]Huang E H,Socher R,Manning C D,et al.Improving word representations via global context and multiple word prototypes//Proceedings of the 50th Annual Meeting of the Association for Computational Linguistic.Stroudsburg,USA,2012:873-882
    [54]Luong M-T,Socher R,Manning C D.Better word representations with recursive neural networks for morphology//Proceedings of the 17th Conference on Computational Natural Language Learning.Sofia,Bulgaria,2013:104-113
    [55]Ji S,Yun H,Yanardag P,et al.WordRank:Learning Word Embeddings via Robust Ranking.arXiv preprint arXiv:1506.02761,2015
    [56]Lazaridou A,Dinu G,Baroni M.Hubness and pollution:delving into cross-space mapping for zero-shot learning//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics.Beijing,China,2015:270-280
    [57]Deerwester S,Dumais S T,Furnas G W,et al.Indexing by latent semantic analysis.Journal of the American Society for Information Science,1990,41(6):391-407
    [58]Levy O,Goldberg Y,Dagan I.Improving distributional similarity with lessons learned from word embeddings.Transactions of the Association for Computational Linguistics,2015,3:211-225
    [59]Caron J.Experiments with LSA scoring:Optimal rank and basis//Berry M W.Computational Information Retrieval.Philadelphia,PA,USA:Society for Industrial and Applied Mathematics,2001:157-169
    [60]Hu X,Cai Z,Franceschetti D,et al.LSA:The first dimension and dimensional weighting//Proceedings of the 25th Annual Conference of the Cognitive Science Society.Boston,USA,2003:587-592
    [61]Hofmann T.Probabilistic latent semantic indexing//Proceedings of the 22nd Annual International ACM SIGIRConference on Research and Development in Information Retrieval.Berkeley,USA,1999:50-57
    [62]Blei D M,Ng A Y,Jordan M I.Latent dirichlet allocation.Journal of Machine Learning Research,2003,3:993-1022
    [63]Hotelling H.Relations between two sets of variates.Biometrika,1936,28(3-4):321-377
    [64]Hotelling H.The most predictable criterion.Journal of Educational Psychology,1935,26(2):139-142
    [65]Stratos K,Collins M,Hsu D.Model-based word embeddings from decompositions of count matrices//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics.Beijing,China,2015:1282-1291
    [66]Dhillon P S,Foster D P,Ungar L H.Eigenwords:Spectral word embeddings.Journal of Machine Learning Research,2015,16:3035-3078
    [67]Dhillon P,Rodu J,Foster D P,et al.Two step CCA:Anew spectral method for estimating vector models of words//Proceedings of the 29th International Conference on Machine Learning.Edinburgh,Scotland,2012:1551-1558
    [68]Dhillon P,Foster D P,Ungar L H.Multi-view learning of word embeddings via CCA//Proceedings of the Advances in Neural Information Processing Systems 24.Granada,Spain,2011:199-207
    [69]Lebret R,Collobert R.Word embeddings through hellinger PCA//Proceedings of the 14th Conference of the European Chapter of the Association for Computational Linguistics.Gothenburg,Sweden,2014:482-490
    [70]Shazeer N,Doherty R,Evans C,et al.Swivel:Improving Embeddings by Noticing What’s Missing.arXiv preprint arXiv:1602.02215,2016
    [71]Firth J R.A synopsis of linguistic theory 1930-55.Studies in Linguistic Analysis(special volume of the Philological Society),1957,1952-59:1-32
    [72]Harris Z.Distributional structure.Word,1954,10(23):146-162
    [73]Mcdonald S,Ramscar M.Testing the distributional hypothesis:The influence of context on judgements of semantic similarity//Proceedings of the 23rd Annual Conference of the Cognitive Science Society.Edinburgh,Scotland,2001:611-616
    [74]Sun F,Guo J,Lan Y,et al.Learning word representations by jointly modeling syntagmatic and paradigmatic relations//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics.Beijing,China,2015:136-145
    [75]Sahlgren M.The distributional hypothesis.Italian Journal of Linguistics,2008,20(1):33-54
    [76]Tang J,Qu M,Mei Q.PTE:Predictive text embedding through large-scale heterogeneous text networks//Proceedings of the 21st ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.Sydney,Australia,2015:1165-1174
    [77]Levy O,Goldberg Y.Neural word embedding as implicit matrix factorization//Proceedings of the Advances in Neural Information Processing Systems 27.Montreal,Canada,2014:2177-2185
    [78]Church K W,Hanks P.Word association norms,mutual information,and lexicography//Proceedings of the 27th Annual Meeting on Association for Computational Linguistics.Stroudsburg,USA,1989:76-83
    [79]Li Y,Xu L,Tian F,et al.Word embedding revisited:Anew representation learning and explicit matrix factorization perspective//Proceedings of the 24th International Joint Conference on Artificial Intelligence.Buenos Aires,Argentina,2015:3650-3656
    [80]Shi T,Liu Z.Linking GloVe with Word2vec.arXiv preprint arXiv:1411.5595,2014
    [81]Suzuki J,Nagata M.A unified learning framework of skipgrams and global vectors//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics.Beijing,China,2015:186-191
    [82]Turian J,Ratinov L,Bengio Y.Word representations:Asimple and general method for semi-supervised learning//Proceedings of the 48th Annual Meeting of the Association for Computational Linguistics.Stroudsburg,USA,2010:384-394
    [83]Baroni M,Dinu G,Kruszewski G.Don’t count,predict!Asystematic comparison of context-counting vs.contextpredicting semantic vectors//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics.Baltimore,Maryland,2014:238-247
    [84]Milajevs D,Kartsaklis D,Sadrzadeh M,et al.Evaluating neural word representations in tensor-based compositional settings//Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing.Doha,Qatar,2014:708-719
    [85]Finkelstein L,Gabrilovich E,Matias Y,et al.Placing search in context:The concept revisited.ACM Transactions on Information Systems,2002,20(1):116-131
    [86]Hill F,Reichart R,Korhonen A.SimLex-999:Evaluating semantic models with(genuine)similarity estimation.Computational Linguistics,2015,41(4):665-695
    [87]Mikolov T,Yih W,Zweig G.Linguistic regularities in continuous space word representations//Proceedings of the2013Conference of the North American Chapter of the Association for Computational Linguistics.Atlanta,USA,2013:746-751
    [88]Arora S,Li Y,Liang Y,et al.Random Walks on Context Spaces:Towards an Explanation of the Mysteries of Semantic Word Embeddings.arXiv preprint arXiv:1502.03520,2015
    [89]Tjong Kim Sang E F,Buchholz S.Introduction to the CoNLL-2000shared task:Chunking//Proceedings of the 2nd Workshop on Learning Language in Logic and the 4th Conference on Computational Natural Language Learning.Stroudsburg,USA,2000:127-132
    [90]Fan R-E,Chang K-W,Hsieh C-J,et al.LIBLINEAR:Alibrary for large linear classification.Journal of Machine Learning Research,2008,9:1871-1874
    [91]Schnabel T,Labutov I,Mimno D,et al.Evaluation methods for unsupervised word embeddings//Proceedings of the 2015Conference on Empirical Methods in Natural Language Processing.Lisbon,Portugal,2015:298-307
    [92]Landauer T K.On the computational basis of learning and cognition:Arguments from LSA.Psychology of Learning and Motivation,2002,41:43-84
    [93]Ling W,Dyer C,Black A W,et al.Two/Too simple adaptations of Word2Vec for syntax problems//Proceedings of the 2015Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Denver,Colorado,2015:1299-1304
    [94]Guo J,Che W,Wang H,et al.Revisiting embedding features for simple semi-supervised learning//Proceedings of the 2014Conference on Empirical Methods in Natural Language Processing.Doha,Qatar,2014:110-120
    [95]Tian F,Dai H,Bian J,et al.A probabilistic model for learning multi-prototype word embeddings//Proceedings of the 25th International Conference on Computational Linguistics:Technical Papers.Dublin,Ireland,2014:151-160
    [96]Neelakantan A,Shankar J,Passos A,et al.Efficient non-parametric estimation of multiple embeddings per word in vector space//Proceedings of the 2014Conference on Empirical Methods in Natural Language Processing.Doha,Qatar,2014:1059-1069
    [97]Chen X,Liu Z,Sun M.A unified model for word sense representation and disambiguation//Proceedings of the 2014Conference on Empirical Methods in Natural Language Processing.Doha,Qatar,2014:1025-1035
    [98]Miller G A.WordNet:A lexical database for English.Communications of the ACM,1995,38(11):39-41
    [99]Qiu L,Cao Y,Nie Z,et al.Learning word representation considering proximity and ambiguity//Proceedings of the28th AAAI Conference on Artificial Intelligence.Québec,Canada,2014:1572-1578
    [100]Liu Y,Liu Z,Chua T-S,et al.Topical word embeddings//Proceedings of the 29th AAAI Conference on Artificial Intelligence.Austin Texas,USA,2015:2418-2424
    [101]Liu P,Qiu X,Huang X.Learning context-sensitive word embeddings with neural tensor skip-gram model//Proceedings of the 24th International Joint Conference on Artificial Intelligence.Buenos Aires,Argentina,2015:1284-1290
    [102]Li J,Jurafsky D.Do multi-sense embeddings improve natural language understanding?//Proceedings of the 2015Conference on Empirical Methods in Natural Language Processing.Lisbon,Portugal,2015:1722-1732
    [103]Botha J A,Blunsom P.Compositional morphology for word representations and language modelling//Proceedings of the31st International Conference on Machine Learning.2014:1899-1907
    [104]Qiu S,Cui Q,Bian J,et al.Co-learning of word representations and morpheme representations//Proceedings of the 25th International Conference on Computational Linguistics.Dublin,Ireland,2014:141-150
    [105]Sun F,Guo J,Lan Y,et al.Inside out:Two jointly predictive models for word representations and phrase representations//Proceedings of the Thirtieth AAAI Conference on Artificial Intelligence.Phoenix,USA,2016:2821-2827
    [106]Ling W,Dyer C,Black A W,et al.Finding function in form:Compositional character models for open vocabulary word representation//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Lisbon,Portugal,2015:1520-1530
    [107]Rubinstein D,Levi E,Schwartz R,et al.How well do distributional models capture different types of semantic knowledge?//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics.Beijing,China,2015:726-730
    [108]Baker C F,Fillmore C J,Lowe J B.The Berkeley Framenet project//Proceedings of the 36th Annual Meeting of the Association for Computational Linguistics.Montreal,Canada,1998:86-90
    [109]Yu M,Dredze M.Improving lexical embeddings with semantic knowledge//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics.Baltimore,Maryland,2014:545-550
    [110]Bian J,Gao B,Liu T-Y.Knowledge-Powered Deep Learning for Word Embedding.Calders T,Esposito F,Hüllermeier E,et al.Machine Learning and Knowledge Discovery in Databases.Springer Berlin Heidelberg,2014,8724:132-148
    [111]Xu C,Bai Y,Bian J,et al.RC-NET:A general framework for Incorporating knowledge into word representations//Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management.Shanghai,China,2014:1219-1228
    [112]Liu Q,Jiang H,Wei S,et al.Learning semantic word embeddings based on ordinal knowledge constraints//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics.Beijing,China,2015:1501-1511
    [113]Faruqui M,Dodge J,Jauhar S K,et al.Retrofitting word vectors to semantic lexicons//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Denver,Colorado,2015:1606-1615
    [114]Griffiths T L,Steyvers M,Tenenbaum J B.Topics in semantic representation.Psychological Review,2007,114(2):211-244
    [115]Schunn C D.The presence and absence of category knowledge in LSA//Proceedings of the 21st Annual Conference of the Cognitive Science Society.Vancouver,Canada,1999:643-648
    [116]Attwell D,Laughlin S B.An energy budget for signaling in the grey matter of the brain.Journal of Cerebral Blood Flow&Metabolism,2001,21(10):1133-1145
    [117]Olshausen B A,Field D J.Sparse coding with an overcomplete basis set:A strategy employed by V1?Vision Research,1997,37(23):3311-3325
    [118]Vinson D P,Vigliocco G.Semantic feature production norms for a large set of objects and events.Behavior Research Methods,2008,40(1):183-190
    [119]Murphy B,Talukdar P,Mitchell T.Learning effective and interpretable semantic models using non-negative sparse embedding//Proceedings of the 24th International Conference on Computational Linguistics.Mumbai,India,2012:1933-1950
    [120]Faruqui M,Tsvetkov Y,Yogatama D,et al.Sparse overcomplete word vector representations//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics.Beijing,China,2015:1491-1500
    [121]Sun F,Guo J,Lan Y,et al.Sparse word embeddings using1 regularized online learning//Proceedings of the 25th International Joint Conference on Artificial Intelligence.New York,USA,2016:2915-2921
    [122]Lee D D,Seung H S.Learning the parts of objects by non-negative matrix factorization.Nature,1999,401(6755):788-791
    [123]Turney P D.Similarity of semantic relations.Computational Linguistics,2006,32(3):379-416
    [124]Tsvetkov Y,Faruqui M,Ling W,et al.Evaluation of word vector representations by subspace alignment//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Lisbon,Portugal,2015:2049-2054
    [125]Chen Y,Perozzi B,Al-Rfou R,et al.The expressive power of word embeddings//Proceedings of the ICML 2013Workshop on Deep Learning for Audio,Speech,and Language Processing.Atlanta,USA,2013:1-11
    [126]Faruqui M,Dyer C.Non-distributional word vector representations//Proceedings of the 53rd Annual Meeting of the Association for Computational Linguistics.Beijing,China,2015:464-469
    [127]Tang D,Wei F,Qin B,et al.Building large-scale twitterspecific sentiment lexicon:A representation learning approach//Proceedings of the 25th International Conference on Computational Linguistics.Dublin,Ireland,2014:172-182
    [128]Chen X,Xu L,Liu Z,et al.Joint learning of character and word embeddings//Proceedings of the 242th International Joint Conference on Artificial Intelligence.Buenos Aires,Argentina,2015:1236-1242
    [129]Li Y,Li W,Sun F,et al.Component-enhanced Chinese character embeddings//Proceedings of the Conference on Empirical Methods in Natural Language Processing.Lisbon,Portugal,2015:829-834
    [130]Wang H,Li C,Zhang Z,et al.Topic modeling for short texts with auxiliary word embeddings//Proceedings of the39th International ACM SIGIR Conference on Research and Development in Information Retrieval.Pisa,Italy,2016:165-174
    [131]Ganguly D,Roy D,Mitra M,et al.Word embedding based generalized language model for information retrieval//Proceedings of the 38th International ACM SIGIR Conference on Research and Development in Information Retrieval.Santiago,Chile,2015:795-798
    [132]Vuli'c I,Moens M-F.Monolingual and cross-lingual information retrieval models based on(bilingual)word embeddings//Proceedings of the 38th International ACM SIGIRConference on Research and Development in Information Retrieval.Santiago,Chile,2015:363-372
    [133]Mitra B.Exploring session context using distributed representations of queries and reformulations//Proceedings of the38th International ACM SIGIR Conference on Research and Development in Information Retrieval.Santiago,Chile,2015:3-12
    [134]Grbovic M,Djuric N,Radosavljevic V,et al.Context-and content-aware embeddings for query rewriting in sponsored search//Proceedings of the 38th International ACM SIGIRConference on Research and Development in Information Retrieval.Santiago,Chile,2015:383-392
    [135]Fu R,Guo J,Qin B,et al.Learning semantic hierarchies via word embeddings//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics.Baltimore,Maryland,2014:1199-1209
    [136]Soricut R,Och F.Unsupervised morphology induction using word embeddings//Proceedings of the Conference of the North American Chapter of the Association for Computational Linguistics:Human Language Technologies.Denver,Colorado,2015:1627-1637
    [137]Fyshe A,Talukdar P P,Murphy B,et al.Interpretable semantic vectors from a joint model of brain-and text-based meaning//Proceedings of the 52nd Annual Meeting of the Association for Computational Linguistics.Baltimore,Maryland,2014:489-499
    [138]Faruqui M,Dyer C.Improving vector space word representations using multilingual correlation//Proceedings of the14th Conference of the European Chapter of the Association for Computational Linguistics.Gothenburg,Sweden,2014:462-471
    [139]Wilson B J,Schakel A M J.Controlled Experiments for Word Embeddings.arXiv preprint arXiv:1510.02675,2015
    [140]Schakel A M J,Wilson B J.Measuring Word Significance using Distributed Representations of Words.arXiv preprint arXiv:1508.02297,2015
    [141]Socher R,Perelygin A,Wu J,et al.Recursive deep models for semantic compositionality over a sentiment treebank//Proceedings of the 2013Conference on Empirical Methods in Natural Language Processing.Seattle,USA,2013:1631-1642
    [142]Socher R,Huang E H,Pennin J,et al.Dynamic pooling and unfolding recursive autoencoders for paraphrase detection//Proceedings of the Advances in Neural Information Processing Systems 24.Granada Spain,2011:801-809
    [143]Blacoe W,Lapata M.A comparison of vector-based representations for semantic composition//Proceedings of the2012Conference on Empirical Methods in Natural Language Processing.Jeju Island,Korea,2012:546-556
    [144]Mitchell J,Lapata M.Composition in distributional models of semantics.Cognitive Science,2010,34(8):1388-1429
    [145]Gershman S J,Tenenbaum J B.Phrase similarity in humans and machines//Proceedings of the 37th Annual Conference of the Cognitive Science Society.Pasadena,USA,2015:776-781
    [146]Chandar A P S,Lauly S,Larochelle H,et al.An autoencoder approach to learning bilingual word representations.Advances in Neural Information Processing Systems 27.Curran Associates,Inc.,2014:1853-1861
    [147]Erk K.Representing words as regions in vector space//Proceedings of the 13th Conference on Computational Natural Language Learning.Stroudsburg,USA,2009:57-65
    [148]Vilnis L,McCallum A.Word representations via Gaussian embedding//Proceedings of the International Conference on Learning Representations.San Diego,USA,2015:1-12
    [149]Koo T,Carreras X,Collins M.Simple semi-supervised dependency parsing//Proceedings of the 46th Annual Meeting of the Association for Computational Linguistics.Columbus,Ohio,2008:595-603
    [150]Liang P.Semi-Supervised Learning for Natural Language[M.S.disseratation].Massachusetts Institute of Technology,Cambridge,USA,2005
    (1)Brown聚类通过极大化临近单词类别间的互信息,层次地聚类单词形成二叉树,树的叶子节点是单词,而中间节点则是类别.而这些聚类也可以被用作单词表示[82,94,149-150].
    (1)对于距离度量d,有d(a,c)d(a,b)+d(b,c).
    (1)http://www.longmandictionariesonline.com.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700