Classifier Evaluation with Missing Negative Class Labels
详细信息    查看全文
  • 作者:Andrew K. Rider (19)
    Reid A. Johnson (19)
    Darcy A. Davis (19)
    T. Ryan Hoens (19)
    Nitesh V. Chawla (19)
  • 关键词:Evaluation ; Classification ; False Negatives
  • 刊名:Lecture Notes in Computer Science
  • 出版年:2013
  • 出版时间:2013
  • 年:2013
  • 卷:8207
  • 期:1
  • 页码:392-403
  • 全文大小:184KB
  • 参考文献:1. Pandey, G., Zhang, B., Chang, A.N., Myers, C.L., Zhu, J., Kumar, V., Schadt, E.E.: An integrative multi-network and multi-classifier approach to predict genetic interactions. PLoS Comput. Biol.聽6(9), e1000928+ (2010)
    2. Elkan, C., Noto, K.: Learning classifiers from only positive and unlabeled data. In: Proceeding of the 14th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pp. 213鈥?20. ACM (2008)
    3. Qi, Y., Bar-Joseph, Z., Klein-Seetharaman, J.: Evaluation of different biological data and computational classification methods for use in protein interaction prediction. Proteins聽63(3), 490鈥?00 (2006) CrossRef
    4. Breitkreutz, B.J., Stark, C., Reguly, T., Boucher, L., Breitkreutz, A., Livstone, M., Oughtred, R., Lackner, D.H., B盲hler, J., Wood, V., Dolinski, K., Tyers, M.: The BioGRID Interaction Database: 2008 update. Nucleic Acids Research聽36(suppl. 1), D637鈥揇640 (2008)
    5. Jansen, R., Yu, H., Greenbaum, D., Kluger, Y., Krogan, N.J., Chung, S., Emili, A., Snyder, M., Greenblatt, J.F., Gerstein, M.: A bayesian networks approach for predicting protein-protein interactions from genomic data. Science聽302(5644), 449鈥?53 (2003) CrossRef
    6. Brem, R.B., Kruglyak, L.: The landscape of genetic complexity across 5,700 gene expression traits in yeast. Proceedings of the National Academy of Sciences of the United States of America聽102(5), 1572鈥?577 (2005) CrossRef
    7. Hughes, T.R., Marton, M.J., Jones, A.R., Roberts, C.J., Stoughton, R., Armour, C.D., Bennett, H.A., Coffey, E., Dai, H., He, Y.D., Kidd, M.J., King, A.M., Meyer, M.R., Slade, D., Lum, P.Y., Stepaniants, S.B., Shoemaker, D.D., Gachotte, D., Chakraburtty, K., Simon, J., Bard, M., Friend, S.H.: Functional discovery via a compendium of expression profiles. Cell聽102(1), 109鈥?26 (2000) CrossRef
    8. Christie, K.R., Hong, E.L., Cherry, J.M.: Functional annotations for the Saccharomyces cerevisiae genome: the knowns and the known unknowns. Trends in Microbiology聽17(7), 286鈥?94 (2009) CrossRef
    9. Myers, C., Barrett, D., Hibbs, M., Huttenhower, C., Troyanskaya, O.: Finding function: evaluation methods for functional genomic data. BMC Genomics聽7(1), 187+ (2006)
    10. Zhang, S., Zhang, C., Yang, Q.: Data preparation for data mining. Applied Artificial Intelligence聽17(5-6), 375鈥?81 (2003) CrossRef
    11. Allison, P.D.: Missing data: Quantitative applications in the social sciences. British Journal of Mathematical and Statistical Psychology聽55, 193鈥?96 (2002) CrossRef
    12. Forman, G.: An extensive empirical study of feature selection metrics for text classification. The Journal of Machine Learning Research聽3, 1289鈥?305 (2003)
    13. Davis, J., Goadrich, M.: The relationship between Precision-Recall and ROC curves. In: Proceedings of the 23rd International Conference on Machine Learning, ICML 2006, pp. 233鈥?40. ACM, New York (2006)
    14. Drummond, C., Holte, R.C.: Explicitly representing expected cost: an alternative to ROC representation. In: Knowledge Discovery and Data Mining, pp. 198鈥?07 (2000)
    15. Landgrebe, T.C.W., Paclik, P., Duin, R.P.W., Bradley, A.P.: Precision-recall operating characteristic (P-ROC) curves in imprecise environments. In: 18th International Conference on Pattern Recognition, ICPR 2006, vol.聽4, pp. 123鈥?27. IEEE (2006)
    16. Cieslak, D.A., Hoens, T.R., Chawla, N.V., Kegelmeyer, W.P.: Hellinger distance decision trees are robust and skew-insensitive. In: Data Mining and Knowledge Discovery, pp. 1鈥?3 (2012)
    17. Hall, M., Frank, E., Holmes, G., Pfahringer, B., Reutemann, P., Witten, I.: The WEKA data mining software: an update. Special Interest Group on Knowledge Discovery and Data Mining Explorer Newsletter聽11(1), 10鈥?8 (2009)
    18. Bache, K., Lichman, M.: UCI machine learning repository (2013)
  • 作者单位:Andrew K. Rider (19)
    Reid A. Johnson (19)
    Darcy A. Davis (19)
    T. Ryan Hoens (19)
    Nitesh V. Chawla (19)

    19. Department of Computer Science and Engineering, University of Notre Dame, Notre Dame, IN, 46556, USA
  • ISSN:1611-3349
文摘
The concept of a negative class does not apply to many problems for which classification is increasingly utilized. In this study we investigate the reliability of evaluation metrics when the negative class contains an unknown proportion of mislabeled positive class instances. We examine how evaluation metrics can inform us about potential systematic biases in the data. We provide a motivating case study and a general framework for approaching evaluation when the negative class contains mislabeled positive class instances. We show that the behavior of evaluation metrics is unstable in the presence of uncertainty in class labels and that the stability of evaluation metrics depends on the kind of bias in the data. Finally, we show that the type and amount of bias present in data can have a significant effect on the ranking of evaluation metrics and the degree to which they over- or underestimate the true performance of classifiers.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700