Multilabel classification with meta-level features in a learning-to-rank framework

详细信息查看全文

作者：Yiming Yang (1) yiming@cs.cmu.edu
Siddharth Gopal (2) sgopal1@andrew.cmu.edu
关键词：Multilabel classification &#8211 ; Learning to rank
刊名：Machine Learning
出版年：2012
出版时间：July 2012
年：2012
卷：88
期：1-2
页码：47-68
全文大小：646.5 KB
参考文献：1. Arya, S., Mount, D., Netanyahu, N., Silverman, R., & Wu, A. (1998). An optimal algorithm for approximate nearest neighbor searching fixed dimensions. Journal of the ACM , 45(6), 891–923.
2. Boutell, M., Luo, J., Shen, X., & Brown, C. (2004). Learning multi-label scene classification. Pattern Recognition, 37(9), 1757–1771.
3. Burges, C., Shaked, T., Renshaw, E., Lazier, A., Deeds, M., Hamilton, N., & Hullender, G. (2005). Learning to rank using gradient descent. In Proceedings of the 22nd international conference on machine learning (p. 96). New York: ACM.
4. Burges, C., Ragno, R., & Le, Q. (2007). Learning to rank with nonsmooth cost functions. Advances in Neural Information Processing Systems, 19, 193.
5. Cao, Z., Qin, T., Liu, T., Tsai, M., & Li, H. (2007). Learning to rank: from pairwise approach to listwise approach. In Proceedings of the 24th international conference on machine learning (p. 136). New York: ACM.
6. Cheng, W., & H眉llermeier, E. (2009). Combining instance-based learning and logistic regression for multilabel classification. Machine Learning, 76(2–3), 211–225. doi:
7. Creecy, R., Masand, B., Smith, S., & Waltz, D. (1992). Trading MIPS and memory for knowledge engineering. Communications of the ACM, 35(8), 48–64.
8. Demšar, J. (2006). Statistical comparisons of classifiers over multiple data sets. Journal of Machine Learning Research, 7, 30.
9. Donmez, P., Svore, K., & Burges, C. (2009). On the local optimality of LambdaRank. In Proceedings of the 32nd international ACM SIGIR conference on research and development in information retrieval (pp. 460–467). New York: ACM.
10. Elisseeff, A., & Weston, J. (2001). Kernel methods for multi-labelled classification and categorical regression problems. In Advances in neural information processing systems (Vol. 14, pp. 681–687). Cambridge: MIT Press.
11. Freund, Y., Iyer, R., Schapire, R., & Singer, Y. (2003). An efficient boosting algorithm for combining preferences. Journal of Machine Learning Research, 4, 933–969.
12. Ganapathiraju, A., Hamaker, J., & Picone, J. (1998). Support vector machines for speech recognition. In International conference on spoken language processing (pp. 2923–2926). New York: ACM.
13. Garcıa, S., & Herrera, F. (2008). An extension on statistical comparisons of classifiers over multiple data sets for all pairwise comparisons. Journal of Machine Learning Research, 9, 2677–2694.
14. Gopal, S., & Yang, Y. (2010). Multilabel classification with meta-level features. In Proceeding of the 33rd international ACM SIGIR conference on research and development in information retrieval (pp. 315–322). New York: ACM.
15. Hastie, T., Tibshirani, R., & Friedman, J. (2009). The elements of statistical learning.
16. J盲rvelin, K., & Kek盲l盲inen, J. (2000). IR evaluation methods for retrieving highly relevant documents. In Proceedings of the 23rd annual international ACM SIGIR conference on research and development in information retrieval (pp. 41–48). New York: ACM.
17. Joachims, T. (1999). Making large-scale support vector machine learning practical. In Advances in kernel methods: support vector learning.
18. Joachims, T. (2002). Optimizing search engines using clickthrough data. In Proceedings of the eighth ACM SIGKDD international conference on knowledge discovery and data mining (pp. 133–142). New York: ACM.
19. Kleinberg, J. (1997). Two algorithms for nearest-neighbor search in high dimensions. In Proceedings of the twenty-ninth annual ACM symposium on theory of computing (pp. 599–608). New York: ACM.
20. Lewis, D., Schapire, R., Callan, J., & Papka, R. (1996). Training algorithms for linear text classifiers. In Proceedings of the 19th annual international ACM SIGIR conference on research and development in information retrieval (pp. 298–306). New York: ACM.
21. Li, P., Burges, C., Wu, Q., Platt, J., Koller, D., Singer, Y., & Roweis, S. (2007) McRank: Learning to rank using multiple classification and gradient boosting. Advances in Neural Information Processing Systems.
22. Qin, T., Liu, T., Xu, J., & Li, H. (2010) LETOR: A benchmark collection for research on learning to rank for information retrieval. Information Retrieval, 1–29.
23. Roussopoulos, N., Kelley, S., & Vincent, F. (1995). Nearest neighbor queries. In ACM sigmod record (Vol. 24, pp. 71–79). New York: ACM.
24. Schapire, R., & Singer, Y. (1999). Improved boosting algorithms using confidence-rated predictions. Machine Learning, 37(3), 297–336.
25. Schapire, R., & Singer, Y. (2000). BoosTexter: A boosting-based system for text categorization. Machine Learning, 39(2), 135–168.
26. Trohidis, K., Tsoumakas, G., Kalliris, G., & Vlahavas, I. (2008). Multilabel classification of music into emotions. In Proc. 9th international conference on music information retrieval (ISMIR 2008), Philadelphia, PA, USA (Vol. 2008).
27. Tsai, M., Liu, T., Qin, T., Chen, H., & Ma, W. (2007). FRank: A ranking method with fidelity loss. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (p. 390). New York: ACM.
28. Tsochantaridis, I., Joachims, T., Hofmann, T., & Altun, Y. (2006). Large margin methods for structured and interdependent output variables. Journal of Machine Learning Research, 6(2), 1453.
29. Tsoumakas, G., Vilcek, J., Spyromitros, E., & Vlahavas, I. (2010). Mulan: a Java library for multilabel learning. Journal of Machine Learning Research, 1, 1–48.
30. Vapnik, V. (2000). The nature of statistical learning theory. Berlin: Springer.
31. Voorhees, E. (2003) Overview of TREC 2002. NIST special publication SP (pp. 1–16).
32. Xu, J., & Li, H. (2007). Adarank: A boosting algorithm for information retrieval. In Proceedings of the 30th annual international ACM SIGIR conference on research and development in information retrieval (p. 398). New York: ACM.
33. Yang, Y. (1994). Expert network: Effective and efficient learning from human decisions in text categorization and retrieval. In ACM SIGIR conference on research and development in information retrieval (pp. 13–22). New York: Springer.
34. Yang, Y. (1999). An evaluation of statistical approaches to text categorization. Information Retrieval, 1(1), 69–90.
35. Yang, Y. (2001). A study of thresholding strategies for text categorization. In Proceedings of the 24th annual international ACM SIGIR conference on research and development in information retrieval (pp. 137–145). New York: ACM.
36. Yang, Y., Liu, X. (1999). A re-examination of text categorization methods. In Proceedings of the 22nd annual international ACM SIGIR conference on research and development in information retrieval (pp. 42–49). New York: ACM.
37. Yang, Y., & Pedersen, J. (1997) A comparative study on feature selection in text categorization. In International conference in machine learning (pp. 412–420). Citeseer.
38. Yianilos, P. (1993). Data structures and algorithms for nearest neighbor search in general metric spaces. In Proceedings of the fourth annual ACM-SIAM symposium on discrete algorithms (pp. 311–321). Philadelphia: Society for Industrial and Applied Mathematics.
39. Yue, Y., & Finley, T. (2007). A support vector method for optimizing average precision. In Proceedings of SIGIR07 (pp. 271–278). New York: ACM.
40. Zhang, M., & Zhou, Z. (2007). ML-KNN: A lazy learning approach to multi-label learning. Pattern Recognition, 40(7), 2038–2048.
作者单位：1. Language Technologies Institute & Machine Learning Department, Carnegie Mellon University, Pittsburgh, USA2. Language Technologies Institute, Carnegie Mellon University, Pittsburgh, USA
刊物类别：Computer Science
刊物主题：Artificial Intelligence and Robotics
Automation and Robotics
Computing Methodologies
Simulation and Modeling
Language Translation and Linguistics
出版者：Springer Netherlands
ISSN：1573-0565

文摘

Effective learning in multi-label classification (MLC) requires an appropriate level of abstraction for representing the relationship between each instance and multiple categories. Current MLC methods have focused on learning-to-map from instances to categories in a relatively low-level feature space, such as individual words. The fine-grained features in such a space may not be sufficiently expressive for learning to rank categories, which is essential in multi-label classification. This paper presents an alternative solution by transforming the conventional representation of instances and categories into meta-level features, and by leveraging successful learning-to-rank retrieval algorithms over this feature space. Controlled experiments on six benchmark datasets using eight evaluation metrics show strong evidence for the effectiveness of the proposed approach, which significantly outperformed other state-of-the-art methods such as Rank-SVM, ML-kNN (Multi-label kNN), IBLR-ML (Instance-based logistic regression for multi-label classification) on most of the datasets. Thorough analyses are also provided for separating the factors responsible for the improved performance.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700