Multilabel classification with meta-level features in a learning-to-rank framework
详细信息    查看全文
  • 作者:Yiming Yang (1) yiming@cs.cmu.edu
    Siddharth Gopal (2) sgopal1@andrew.cmu.edu
  • 关键词:Multilabel classification &#8211 ; Learning to rank
  • 刊名:Machine Learning
  • 出版年:2012
  • 出版时间:July 2012
  • 年:2012
  • 卷:88
  • 期:1-2
  • 页码:47-68
  • 全文大小:646.5 KB
  • 参考文献:1. Arya, S., Mount, D., Netanyahu, N., Silverman, R., & Wu, A. (1998). An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM , 45(6), 891–923.
    2. Boutell, M., Luo, J., Shen, X., & Brown, C. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757–1771.
    3. Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., & Hullender, G. (2005). Learning to rank using gradient descent. In Proceedings of the 22nd international conference on machine learning (p. 96). New York: ACM.
    4. Burges, C., Ragno, R., & Le, Q. (2007). Learning to rank with nonsmooth cost functions. Advances in Neural Information Processing Systems, 19, 193.
    5. Cao, Z., Qin, T., Liu, T., Tsai, M., & Li, H. (2007). Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on machine learning (p. 136). New York: ACM.
    6. Cheng, W., & H眉llermeier, E. (2009). Combining instance-based learning and logistic regression for multilabel classification. Machine Learning, 76(2–3), 211–225. doi:
    7. Creecy, R., Masand, B., Smith, S., & Waltz, D. (1992). Trading MIPS and memory for knowledge engineering. Communications of the ACM, 35(8), 48–64.
    8. Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 30.
    9. Donmez, P., Svore, K., & Burges, C. (2009). On the local optimality of LambdaRank. In Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval (pp. 460–467). New York: ACM.
    10. Elisseeff, A., & Weston, J. (2001). Kernel methods for multi-labelled classification and categorical regression problems. In Advances in neural information processing systems (Vol. 14, pp. 681–687). Cambridge: MIT Press.
    11. Freund, Y., Iyer, R., Schapire, R., & Singer, Y. (2003). An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4, 933–969.
    12. Ganapathiraju, A., Hamaker, J., & Picone, J. (1998). Support vector machines for speech recognition. In International conference on spoken language processing (pp. 2923–2926). New York: ACM.
    13. Garcıa, S., & Herrera, F. (2008). An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. Journal of Machine Learning Research, 9, 2677–2694.
    14. Gopal, S., & Yang, Y. (2010). Multilabel classification with meta-level features. In Proceeding of the 33rd international ACM SIGIR conference on research and development in information retrieval (pp. 315–322). New York: ACM.
    15. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning.
    16. J盲rvelin, K., & Kek盲l盲inen, J. (2000). IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (pp. 41–48). New York: ACM.
    17. Joachims, T. (1999). Making large-scale support vector machine learning practical. In Advances in kernel methods: support vector learning.
    18. Joachims, T. (2002). Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 133–142). New York: ACM.
    19. Kleinberg, J. (1997). Two algorithms for nearest-neighbor search in high dimensions. In Proceedings of the twenty-ninth annual ACM symposium on theory of computing (pp. 599–608). New York: ACM.
    20. Lewis, D., Schapire, R., Callan, J., & Papka, R. (1996). Training algorithms for linear text classifiers. In Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval (pp. 298–306). New York: ACM.
    21. Li, P., Burges, C., Wu, Q., Platt, J., Koller, D., Singer, Y., & Roweis, S. (2007) McRank: Learning to rank using multiple classification and gradient boosting. Advances in Neural Information Processing Systems.
    22. Qin, T., Liu, T., Xu, J., & Li, H. (2010) LETOR: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval, 1–29.
    23. Roussopoulos, N., Kelley, S., & Vincent, F. (1995). Nearest neighbor queries. In ACM sigmod record (Vol. 24, pp. 71–79). New York: ACM.
    24. Schapire, R., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3), 297–336.
    25. Schapire, R., & Singer, Y. (2000). BoosTexter: A boosting-based system for text categorization. Machine Learning, 39(2), 135–168.
    26. Trohidis, K., Tsoumakas, G., Kalliris, G., & Vlahavas, I. (2008). Multilabel classification of music into emotions. In Proc. 9th international conference on music information retrieval (ISMIR 2008), Philadelphia, PA, USA (Vol. 2008).
    27. Tsai, M., Liu, T., Qin, T., Chen, H., & Ma, W. (2007). FRank: A ranking method with fidelity loss. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (p. 390). New York: ACM.
    28. Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2006). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6(2), 1453.
    29. Tsoumakas, G., Vilcek, J., Spyromitros, E., & Vlahavas, I. (2010). Mulan: a Java library for multilabel learning. Journal of Machine Learning Research, 1, 1–48.
    30. Vapnik, V. (2000). The nature of statistical learning theory. Berlin: Springer.
    31. Voorhees, E. (2003) Overview of TREC 2002. NIST special publication SP (pp. 1–16).
    32. Xu, J., & Li, H. (2007). Adarank: A boosting algorithm for information retrieval. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (p. 398). New York: ACM.
    33. Yang, Y. (1994). Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In ACM SIGIR conference on research and development in information retrieval (pp. 13–22). New York: Springer.
    34. Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1), 69–90.
    35. Yang, Y. (2001). A study of thresholding strategies for text categorization. In Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval (pp. 137–145). New York: ACM.
    36. Yang, Y., Liu, X. (1999). A re-examination of text categorization methods. In Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval (pp. 42–49). New York: ACM.
    37. Yang, Y., & Pedersen, J. (1997) A comparative study on feature selection in text categorization. In International conference in machine learning (pp. 412–420). Citeseer.
    38. Yianilos, P. (1993). Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of the fourth annual ACM-SIAM symposium on discrete algorithms (pp. 311–321). Philadelphia: Society for Industrial and Applied Mathematics.
    39. Yue, Y., & Finley, T. (2007). A support vector method for optimizing average precision. In Proceedings of SIGIR07 (pp. 271–278). New York: ACM.
    40. Zhang, M., & Zhou, Z. (2007). ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038–2048.
  • 作者单位:1. Language Technologies Institute & Machine Learning Department, Carnegie Mellon University, Pittsburgh, USA2. Language Technologies Institute, Carnegie Mellon University, Pittsburgh, USA
  • 刊物类别:Computer Science
  • 刊物主题:Artificial Intelligence and Robotics
    Automation and Robotics
    Computing Methodologies
    Simulation and Modeling
    Language Translation and Linguistics
  • 出版者:Springer Netherlands
  • ISSN:1573-0565
文摘
Effective learning in multi-label classification (MLC) requires an appropriate level of abstraction for representing the relationship between each instance and multiple categories. Current MLC methods have focused on learning-to-map from instances to categories in a relatively low-level feature space, such as individual words. The fine-grained features in such a space may not be sufficiently expressive for learning to rank categories, which is essential in multi-label classification. This paper presents an alternative solution by transforming the conventional representation of instances and categories into meta-level features, and by leveraging successful learning-to-rank retrieval algorithms over this feature space. Controlled experiments on six benchmark datasets using eight evaluation metrics show strong evidence for the effectiveness of the proposed approach, which significantly outperformed other state-of-the-art methods such as Rank-SVM, ML-kNN (Multi-label kNN), IBLR-ML (Instance-based logistic regression for multi-label classification) on most of the datasets. Thorough analyses are also provided for separating the factors responsible for the improved performance.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700