用户名: 密码: 验证码:
Learning Topic-Oriented Word Embedding for Query Classification
详细信息    查看全文
  • 作者:Hebin Yang (10) (11)
    Qinmin Hu (10) (11)
    Liang He (10) (11)

    10. Department of Computer Science and Technology
    ; East China Normal University Shanghai ; 200241 ; Shanghai ; China
    11. Shanghai Key Laboratory of Multidimensional Information Processing
    ; East China Normal University ; Shanghai ; 200241 ; China
  • 关键词:Query classification ; Word embedding ; Word2vec ; Supervised learning
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2015
  • 出版时间:2015
  • 年:2015
  • 卷:9077
  • 期:1
  • 页码:188-198
  • 全文大小:667 KB
  • 参考文献:1. Beitzel, S.M., Jensen, E.C., Lewis, D.D., Chowdhury, A., Frieder, O.: Automatic classification of web queries using very large unlabeled query logs. ACM Transactions on Information Systems (TOIS) 25(2), 9 (2007)
    2. Bengio, Y., Schwenk, H., Sen茅cal, J.-S., Morin, F., Gauvain, J.-L.: Neural probabilistic language models. In: Holmes, D.E., Jain, L.C. (eds.) Neural ProbabilisticLanguage Models. StudFuzz, vol. 194, pp. 137鈥?86. Springer, Heidelberg (2006)
    3. Blei, D.M., Ng, A.Y., Jordan, M.I.: Latent dirichlet allocation. the. Journal of Machine Learning Research 3, 993鈥?022 (2003)
    4. Broder, A.Z., Fontoura, M., Gabrilovich, E., Joshi, A., Josifovski, V., Zhang, T.: Robust classification of rare queries using web knowledge. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 231鈥?38. ACM (2007)
    5. Collobert, R, Weston, J, Bottou, L, Karlen, M, Kavukcuoglu, K, Kuksa, P (2011) Natural language processing (almost) from scratch. The Journal of Machine Learning Research 12: pp. 2493-2537
    6. Ganti, V., K枚nig, A.C., Li, X.: Precomputing search features for fast and accurate query classification. In: Proceedings of the Third ACM International Conference on Web Search and Data Mining, pp. 61鈥?0. ACM (2010)
    7. Hinton, G.E.: Learning distributed representations of concepts. In: Proceedings of the Eighth Annual Conference of the Cognitive Science Society, vol. 1, p. 12. Amherst, MA (1986)
    8. Le, Q.V., Mikolov, T.: Distributed representations of sentences and documents. arXiv preprint arXiv:1405.4053 (2014)
    9. Li, X, Wang, Y-Y, Shen, D, Acero, A (2010) Learning with click graph for query intent classification. ACM Transactions on Information Systems (TOIS) 28: pp. 12 CrossRef
    10. Mikolov, T., Chen, K., Corrado, G., Dean, J.: Efficient estimation of word representations in vector space. arXiv preprint arXiv:1301.3781 (2013)
    11. Mikolov, T., Karafi谩t, M., Burget, L., Cernock峄? J., Khudanpur, S.: Recurrent neural network based language model. In: INTERSPEECH, pp. 1045鈥?048 (2010)
    12. Pedregosa, F, Varoquaux, G, Gramfort, A, Michel, V, Thirion, B, Grisel, O, Blondel, M, Prettenhofer, P, Weiss, R, Dubourg, V, Vanderplas, J, Passos, A, Cournapeau, D, Brucher, M, Perrot, M, Duchesnay, E (2011) Scikit-learn: Machine learning in Python. Journal of Machine Learning Research 12: pp. 2825-2830
    13. Rei, L., Mladenic, D.: Learning semantic representations of words and their compositionality (2014)
    14. Shen, D., Pan, R., Sun, J.-T., Pan, J.J., Wu, K., Yin, J., Yang, Q.: Query enrichment for web-query classification. ACM Transactions on Information Systems (TOIS), 24(3), 320鈥?52 (2006)
    15. Socher, R., Perelygin, A., Wu, J.Y., Chuang, J., Manning, C.D., Ng, A.Y., Potts, C.: Recursive deep models for semantic compositionality over a sentiment treebank. In: Proceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), Citeseer, pp. 1631鈥?642 (2013)
    16. Sun, Y., Lin, L., Tang, D., Yang, N., Ji, Z., Wang, X.: Radical-enhanced chinese character embedding. arXiv preprint arXiv:1404.4714 (2014)
    17. Tang, D., Wei, F., Yang, N., Zhou, M., Liu, T., Qin, B.: Learning sentiment-specific word embedding for twitter sentiment classification. ACL (2014)
    18. Zelikovitz, S, Marquez, F (2005) Transductive learning for short-text classification problems using latent semantic indexing. International Journal of Pattern Recognition and Artificial Intelligence 19: pp. 143-163 CrossRef
    19. Zhang, M, Zhang, Y, Che, W, Liu, T (2013) Chinese parsing exploiting characters. ACL 1: pp. 125-134
  • 作者单位:Advances in Knowledge Discovery and Data Mining
  • 丛书名:978-3-319-18037-3
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
文摘
In this paper, we propose a topic-oriented word embedding approach to address the query classification problem. First, the topic information is encoded to generate query categories. Then, the user click-through information is also incorporated in the modified word embedding algorithms. After that, the short and ambiguous queries are enriched to be classified in a supervised learning way. The unique contributions are that we present four neural network strategies based on the proposed model. The experiments are designed on two open data sets, namely Baidu and Sogou, which are two famous commercial search companies. Our evaluation results show that the proposed approach is promising on both large data sets. Under the four proposed strategies, we achieve the high performance as 95.73% in terms of Precision, 97.79% in terms of the F1 measure.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700