Enhanced Query Classification with Millions of Fine-Grained Topics
详细信息    查看全文
  • 关键词:Multi ; class query classification ; Large ; scale classification ; Search log mining ; Query clustering
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2016
  • 出版时间:2016
  • 年:2016
  • 卷:9659
  • 期:1
  • 页码:120-131
  • 全文大小:472 KB
  • 参考文献:1.Barandela, R., Sánchez, J.S., et al.: Strategies for learning in class imbalance problems. Pattern Recogn. 36(3), 849–851 (2003)CrossRef
    2.Bekkerman, R., Gavish, M.: High-precision phrase-based document classification on a modern scale. In: Proceedings of the 17th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD 2011, pp. 231–239. ACM, New York (2011)
    3.Broder, A.: A taxonomy of web search. SIGIR Forum 36(2), 3–10 (2002)CrossRef MATH
    4.Broder, A., Fontoura, M., et al.: A semantic approach to contextual advertising. In: Proceedings of the 30th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR 2007, pp. 559–566. ACM, New York (2007)
    5.Broder, A.Z., Fontoura, M., et al.: Robust classification of rare queries using web knowledge. In: Proceedings of the 30th Annual International ACM SIGIR, pp. 231–238 (2007)
    6.Fan, R.-E., Chang, K.-W., Hsieh, C.-J., Wang, X.-R., Lin, C.-J.: LIBLINEAR: A library for large linear classification. J. Mach. Learn. Res. 9, 1871–1874 (2008)MATH
    7.Galar, M., Fernández, A., et al.: Empowering difficult classes with a similarity-based aggregation in multi-class classification problems. Inf. Sci. 264, 135–157 (2014)MathSciNet CrossRef MATH
    8.Phan, X.-H., Nguyen, L.-M., Horiguchi, S.: Learning to classify short and sparse text & web with hidden topics from large-scale data collections. In: Proceedings of the 17th International Conference on World Wide Web, WWW 2008, pp. 91–100. ACM, New York (2008)
    9.Radlinski, F., Szummer, M., Craswell, N.: Inferring query intent from reformulations and clicks. In: Proceedings of the 19th International Conference on World Wide Web, WWW 2010, pp. 1171–1172. ACM, New York (2010)
    10.Shen, D., Ruvini, J.-D., Sarwar, B.: Large-scale item categorization for e-commerce. In: Proceedings of the 21st ACM International Conference on Information and Knowledge Management, CIKM 2012, pp. 595–604, ACM, New York (2012)
    11.Sun, C., Rampalli, N., Yang, F., Doan, A.: Chimera: Large-scale classification using machine learning, rules, and crowdsourcing. Proc. VLDB Endowment 7(13), 1529–1540 (2014)CrossRef
    12.Wang, F., Wang, Z., et al.: Concept-based short text classification and ranking. In: Proceedings of the 23rd ACM International Conference on Conference on Information and Knowledge Management, pp. 1069–1078. Shanghai, 3–7 November 2014
    13.Wang, S.I., Manning, C.D.: Baselines and bigrams: Simple, good sentiment and topic classification. In: Proceedings of the ACL, pp. 90–94 (2012)
    14.Yang, S., Kolcz, A., Schlaikjer, A., Gupta, P.: Large-scale high-precision topic modeling on twitter. In: Proceedings of the 20th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 1907–1916. ACM New York (2014)
    15.Ye, Q., Bin, W., Bai, W.: The influence of technology on social network analysis and mining. In: Özyer, T., Rokne, J., Wagner, G., Reuser, A.H.P. (eds.) Detecting Communities in Massive Networks Efficiently with Flexible Resolution, pp. 373–392. Springer, Heidelberg (2013)
    16.Ye, Q., Wang, F., Li, B.: Starrysky: A practical system to track millions of high-precision query intents. In: 8th International Workshop on Web Intelligence & Communities, April 2016 (to appear)
    17.Yu, H.-F., Hoy, C.-H., et al.: Product title classification versus text classification. Technical report, Department of Computer Science, The University of Texas, Austin (2012). http://​www.​csie.​ntu.​edu.​tw/​~cjlin/​papers/​title.​pdf
    18.Yuan, G.-X., Ho, C.-H., Lin, C.-J.: Recent advances of large-scale linear classification. Proc. IEEE 100(9), 2584–2603 (2012)CrossRef
  • 作者单位:Qi Ye (18)
    Feng Wang (18)
    Bo Li (18)
    Zhimin Liu (18)

    18. Sogou Inc., Beijing, China
  • 丛书名:Web-Age Information Management
  • ISBN:978-3-319-39958-4
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Computer Communication Networks
    Software Engineering
    Data Encryption
    Database Management
    Computation by Abstract Devices
    Algorithm Analysis and Problem Complexity
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1611-3349
  • 卷排序:9659
文摘
Query classification is a crucial task to understand user search intents. Although this problem has been well studied in the past decades, it is still a big challenge in real-world applications due to the sparse, noisy and ambiguous nature of queries. In this paper, we present another important issue called “the pomegranate phenomenon”. This phenomenon is named for the gap between manually manageable small taxonomy and massive coherent topics in each category. Furthermore, the fine-grained topics in the same category of the taxonomy may be textually more relevant to the topics in other categories. This phenomenon will hurt the performances of most traditional classification methods. To overcome this problem, we present a practical approach to enhance the performances of traditional query classifiers. First, we detect millions of fine-grained query topics from two years of click logs which can represent different query intents and give them category labels. Second, for a given query, we calculate the K most relevant topics and select the label by majority voting, then try to use this label to improve the results of classical query classification methods. Empirical evaluation confirms that our topic based classification algorithms can significantly enhance the performances of traditional classifiers in read-world query classification tasks.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700