Dynamic classifier ensemble for positive unlabeled text stream classification
详细信息    查看全文
  • 作者:Shirui Pan (1) panshirui@nwsuaf.edu.cn
    Yang Zhang (1) zhangyang@nwsuaf.edu.cn
    Xue Li (2) xueli@itee.uq.edu.au
  • 关键词:Positive unlabeled learning – ; Text streams – ; Classifier ensemble – ; Concept drift
  • 刊名:Knowledge and Information Systems
  • 出版年:2012
  • 出版时间:November 2012
  • 年:2012
  • 卷:33
  • 期:2
  • 页码:267-287
  • 全文大小:566.9 KB
  • 参考文献:1. Calvo B, Larranaga P, Lozano JA (2005) Learning bayesian classifiers from positive and unlabeled examples. Pattern Recognit Lett 28(16): 2375–2384
    2. Cheng R, Kalashnikov D, Prabhakar S (2005) Learning from positive and unlabeled examples. Theor Comput Sci 38(1): 70–83
    3. Didaci L, Giacinto G, Roli F, Marcialis GL (2005) A study on the performances of dynamic classifier selection based on local accuracy estimation. Pattern Recognit 38(11): 2188–2191
    4. Dietterich TG (2002) Ensemble methods in machine learning. In: Proceedings of the first international workshop on multiple classifier systems, pp 1–15
    5. Domingos P, Hulten G (2000) Mining high-speed data streams. In: Proceedings of the sixth ACM SIGKDD international conference on knowledge discovery and data mining(KDD’00). Boston, pp 71–80
    6. Fan W (2004) Systematic data selection to mine concept-drifting data streams. In: Proceedings of the Tenth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining(KDD’04), ACM Press, pp 128–137
    7. Fan W, Huang YA, Wang H, Yu PS (2004a) Active mining of data streams. In: Proceedings of the fourth SIAM international conference on data mining(SDM’04), pp 457–461
    8. Fan W, Huang YA, Yu PS (2004b) Decision tree evolution using limited number of labeled data items from drifting data streams. In: Proceedings of the fourth IEEE international conference on data mining(ICDM’04), pp 379–382
    9. Fung GPC, Yu JX, Lu H, Yu PS (2006) Text classification without negative examples revisit. IEEE Trans Knowl Data Eng 18(1): 6–20
    10. Grossi V, Turini F (2010) Stream mining: a novel architecture for ensemble-based classification. Knowl Inf Syst: 1–35. doi:10.1007/s10115-011-0378-4
    11. Huang S, Dong Y (2007) An active learning system for mining time-changing data streams. Intell Data Anal 11(4): 401–419
    12. Hulten G, Spencer L, Domingos P (2001) Mining time-changing data streams. In: Proceedings of the seventh ACM SIGKDD international conference on knowledge discovery and data mining(KDD’01), pp 97–106
    13. Klinkenberg R, Joachims T (2000) Detecting concept drift with support vector machines. In: Proceedings of the seventeenth international conference on machine learning(ICML’00), pp 487–494
    14. Koa A, Sabourina R, Britto A Jr (2008) From dynamic classifier selection to dynamic ensemble selection. Pattern Recognit 41(5):1718–1731
    15. Kolter JZ, Maloof MA (2003) Dynamic weighted majority: a new ensemble method for tracking concept drift. In: Proceedings of the third international conference on data mining (ICDM’03), pp 123–130
    16. Lewis DD, Yang Y, Rose TG, Dietterich G, Li F, Li F (2004) RCV1: a new benchmark collection for text categorization research. J Mach Learn Res 5: 361–397
    17. Li C, Zhang Y, Li X (2009a) OcVFDT: one-class very fast decision tree for one-class classification of data streams. In: Proceedings of the third international workshop on knowledge discovery from sensor data. Paris, pp 79–86
    18. Li X, Liu B (2003) Learning to classify texts using positive and unlabeled data. In: International joint conference on artificial intelligence (IJCAI’03), pp 587–594
    19. Li X, Liu B (2005) Learning from positive and unlabeled examples with different data distributions. In: Proceedings of European conference on machine learning (ECML’05), pp 218–229
    20. Li XL, Yu PS, Liu B, Ng SK (2009b) Positive unlabeled learning for data stream classification. In: Proceedings of the ninth SIAM international conference on data mining (SDM’09), pp 257–268
    21. Liu B, Lee WS, Yu PS, Li X (2002) Partially supervised classification of text documents. In: Proceedings of the nineteenth international conference on machine learning (ICML’02)
    22. Liu B, Dai Y, Li X, Lee WS, Yu PS (2003) Building text classifiers using positive and unlabeled examples. In: Proceedings of the third IEEE international conference on data mining (ICDM’03), pp 179–186
    23. Salton G, Buckley C (1988) Term-weighting approaches in automatic text retrieval. Inf Process Manag 24(5):513–523
    24. Sch枚lkopf B, Platt JC, Shawe-Taylor J, Smola AJ, Williamson RC (2001) Estimating the support of a high-dimensional distribution. Neural Comput 13(7): 1443–1471
    25. Sebastiani F (2002) Machine learning in automated text categorization. ACM Comput Surv 34(1): 1–47
    26. Street W, Kim Y (2001) A streaming ensemble algorithm (SEA) for large-scale classification. In: Proceedings of the seventh international conference on knowledge discovery and data mining (KDD’01), pp 377–382
    27. Tsymbal A, Pechenizkiy M, Cunningham P, Puuronen S (2008) Dynamic integration of classifiers for handling concept drift. Inf Fusion 9(1): 56–68
    28. Wang H, Fan W, Yu PS, Han J (2003) Mining concept-drifting data streams using ensemble classifiers. In: Proceedings of the ninth international conference on knowledge discovery and data mining (KDD’03), pp 226–235
    29. Widmer G, Kubat M (1993) Effective learning in dynamic environments by explicit context tracking. In: European conference on machine learning (ECML’93). Springer, Berlin, pp 227–243
    30. Widmer G, Kubat M (1996) Learning in the presence of concept drift and hidden contexts. Mach Learn 23(1): 69–101
    31. Widyantoro D, Yen J (2005) Relevant data expansion for learning concept drift from sparsely labeled data. IEEE Trans Knowl Data Eng 17(3): 401–412
    32. Woods K, Kegelmeyer WP Jr, Bowyer K (1997) Combination of multiple classifiers using local accuracy estimates. IEEE Trans Pattern Anal Mach Intell 19(4): 405–410
    33. Wozniak M (2010) A hybrid decision tree training method using data streams. Knowl Inf Syst: 1–13. doi:10.1007/s10115-010-0345-5
    34. Wu S, Yang C, Zhou J (2006) Clustering-training for data stream mining. In: Proceedings of the sixth IEEE international conference on data mining workshops (ICDMW’06), pp 653–656
    35. Yu H, Han J, Chang KCC (2004) PEBL: web page classification without negative examples. IEEE Trans Knowl Data Eng 16(1):70–81
    36. Zhang B, Zuo W (2008) Learning from positive and unlabeled examples: a survey. In: International symposiums on information processing, IEEE Computer Society, Los Alamitos, pp 650–654
    37. Zhang P, Zhu X, Shi Y (2008a) Categorizing and mining concept drifting data streams. In: Proceeding of the 14th ACM SIGKDD international conference on Knowledge discovery and data mining (KDD’08). Las Vegas, pp 812–820
    38. Zhang Y, Jin X (2006) An automatic construction and organization strategy for ensemble learning on data streams. ACM SIGMOD Rec 35(3): 28–33
    39. Zhang Y, Li X, Orlowska M (2008b) One-class classification of text streams with concept drift. In: Proceedings of the 2008 IEEE international conference on data mining workshops (ICDMW’08), pp 116–125
    40. Zhou ZH, Wu J, Tang W (2002) Ensembling neural networks: many could be better than all. Artif Intell 137(1–2): 239–263
    41. Zhu X, Wu X, Yang Y (2006) Effective classification of noisy data streams with attribute-oriented dynamic classifier selection. Knowl Inf Syst 9(3): 339–363
    42. Zhu X, Zhang P, Lin X, Shi Y (2007) Active learning from data streams. In: Proceedings of the seventh international conference on data mining (ICDM’07), pp 757–762
    43. Zhu X, Ding W, Yu P, Zhang C (2010) One-class learning and concept summarization for data streams. Knowl Inf Syst: 1–31. http://dx.doi.org/10.1007/s10115-010-0331-y
  • 作者单位:1. College of Information Engineering, Northwest A&F University, Yangling, China2. School of Information Technology and Electrical Engineering, University of Queensland, Brisbane, Australia
  • ISSN:0219-3116
文摘
Most of studies on streaming data classification are based on the assumption that data can be fully labeled. However, in real-life applications, it is impractical and time-consuming to manually label the entire stream for training. It is very common that only a small part of positive data and a large amount of unlabeled data are available in data stream environments. In this case, applying the traditional streaming algorithms with straightforward adaptation to positive unlabeled stream may not work well or lead to poor performance. In this paper, we propose a Dynamic Classifier Ensemble method for Positive and Unlabeled text stream (DCEPU) classification scenarios. We address the problem of classifying positive and unlabeled text stream with various concept drift by constructing an appropriate validation set and designing a novel dynamic weighting scheme in the classification phase. Experimental results on benchmark dataset RCV1-v2 demonstrate that the proposed method DCEPU outperforms the existing LELC (Li et al. 2009b), DVS (with necessary adaption) (Tsymbal et al. in Inf Fusion 9(1):56–68, 2008), and Stacking style ensemble-based algorithm (Zhang et al. 2008b).

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700