基于自我训练的长效垃圾分类方法

英文篇名：Long-term waste classification based on self-training
作者：刘雅璇 ; 潘万彬
英文作者：Liu Yaxuan;Pan Wanbin;School of Media and Design,Hangzhou Dianzi University;Department of Mechanical Engineering,National University of Singapore;
关键词：垃圾分类 ; 自我训练 ; 集成分类器 ; Bagging ; 长效分类
英文关键词：waste classification;;self-training;;ensembled classifier;;Bagging;;long-term classification
中文刊名：ZGTB
英文刊名：Journal of Image and Graphics
机构：杭州电子科技大学数字媒体与艺术设计学院;新加坡国立大学机械工程系;
出版日期：2019-05-16
出版单位：中国图象图形学报
年：2019
期：v.24;No.277
基金：国家自然科学基金项目(61702147);; 浙江大学CAD&CG国家重点实验室开放课题项目(A1816)~~
语种：中文;
页：ZGTB201905010
页数：9
CN：05
ISSN：11-3758/TB
分类号：113-121

摘要

目的目前垃圾主要采用名称检索的方式开展分类,这类方法通常基于事先设定的数据分类,很难有效包含现有所有的垃圾,更难应对未来持续增多的垃圾,针对上述问题,面向生活垃圾,提出一种基于自我训练的长效垃圾分类方法。方法首先,采用Bagging将两类分类能力和训练机制不同的基分类器:K近邻分类器和支持向量机,根据它们各自独立的投票和权重进行有机组合,提出了一种新颖的集成分类器对生活垃圾进行分类;其次,基于直观的图像交互反馈,动态地更新分类器相应分类结果的置信度和基于云的训练样本集,提升后续分类的准确性和方法本身的自学习能力。结果使用包含233条生活垃圾的训练样本集对原型系统进行训练,并使用151条垃圾样例进行测试,实验表明本文提出的集成分类器对生活垃圾的分类准确性可以达到95%左右。通过逐步提高训练样本集中错误样本的比例(≤30%)并重新训练集成分类器,再采用上述151条样例共开展了150次分类测试。相应的平均准确率分析表明,本文的集成分类器具有较高且较为稳定的分类准确率(≥93%)。此外,在上述实验中加入反馈机制后,平均准确率分析表明,该机制能有效地减轻错误样本对本文集成分类器准确率衰减带来的影响。结论本文方法对生活垃圾分类具有较高的分类准确率、鲁棒性且具有良好的长效性。
Objective Given the improvement of people's consumption,daily waste increases in quantity and type. Classifying waste correctly is important to protect human health and maintain a clean and safe environment. With the popularity of the internet and the development of information technology,retrieving waste by smartphones based on waste names is a popular waste classification method. However,this method usually works on some static data classifications. Hence,covering all waste with this method and extending the approach to include new types of waste are difficult. To address the problem,this study proposes a long-term waste classification method for domestic waste based on self-training. Method The proposed method,which fully uses the capability of machine learning,can update its corresponding training set and conduct selftraining on the basis of users' inputs and feedback realized by waste image selection. Thus,a high user participation equates to the high classification accuracy of our method. Accordingly,the proposed method is mainly composed of two parts. 1) To make our method effective in classification,we adopt a new ensemble classifier that integrates K-nearest neighbor classifier( KNN) s and support vector machine( SVM) s( as basis classifiers) together by adopting bagging based on independent voting and weights. In this method,misclassification oversampling technology is combined with bagging to promote the accuracies of these basis classifiers. 2) A feedback mechanism based on image selection is used to automatically update our classifier's confidence and extend our waste training set,thereby upgrading its classification accuracy and self-training ability. Result A corresponding domestic waste classifying prototype is developed to validate the effectiveness of the above method. Here,a training set that contains 233 waste samples is used to train our ensembled classifier,whereas a test set with 151 waste samples is used to evaluate the accuracy and robustness of our ensembled classifier. The experiments demonstrate that the average classification accuracy rate of the ensembled classifier( approximately 95%) is better than that of each basis classifier. Along with the gradual increase in the proportion of incorrect samples in the training set( ≤30%),we correspondingly train the ensembled classifier on the data and then conduct a classification test by using the above test set. The corresponding average accuracy analyses illustrate that our ensembled classifier can maintain a relatively high and stable classification accuracy rate( ≥93%),whereas the feedback mechanism can effectively help our method to alleviate the negative influence brought by incorrect samples. Conclusion Classifying waste is closely related to people's health and environmental protection. However,long-term methods to effectively implement the above work,along with the increasing number and types of waste,remain rare,especially in mobile platforms. Thus,a new long-term waste classification for domestic waste based on self-training is presented in this work. The method is characterized by an accurate and robust domestic waste classification ability and a self-learning ability. These abilities are verified by a novel ensembled classifier and feedback mechanism. However,the method still has some disadvantages that should be improved. 1) The waste image input is mainly used by our feedback mechanism,whereas its corresponding features are mainly described by text because the general and effective methods for extracting waste features from images remain rare. 2) The automatic feedback mechanism should be studied to improve the automation level of the entire method.

引文

[1]Xu L,Lin M L,Lu Y J.Research on influencing factors of garbage classification in the city[J].Journal of Public Management,2017,14(1):142-153.[徐林,凌卯亮,卢昱杰.城市居民垃圾分类的影响因素研究[J].公共管理学报,2017,14(1):142-153.][DOI:10.16149/j.cnki.23-1523.2017.01.012]
    [2]Liu X L,Ding S F,Zhu H,et al.Appropriateness in applying SVMs to text classification[J].Computer Engineer&Science,2010,32(6):106-108.[刘晓亮,丁世飞,朱红,等.SVM用于文本分类的适用性[J].计算机工程与科学,2010,32(6):106-108.][DOI:10.3969/j.issn.1007-130X.2010.06.029]
    [3]Fernandez-Delgado M,Cernadas E,Barro S,et al.Do we need hundreds of classifiers to solve real world classification problems?[J].Journal of Machine Learning Research,2014,15(1):3133-3181.
    [4]Chen H H.Research on the text classification based on multikernel support vector machine[J].Computer Engineering&Software,2015,36(5):7-10.[陈海红.多核SVM文本分类研究[J].软件,2015,36(5):7-10.][DOI:10.3969/j.issn.1003-6970.2015.05.002]
    [5]Cui J M,Liu J M,Liao Z Y.Research of text categorization based on support vector machine[J].Computer Simulation,2013,30(2):299-302,368.[崔建明,刘建明,廖周宇.基于SVM算法的文本分类技术研究[J].计算机仿真,2013,30(2):299-302,368.][DOI:10.3969/j.issn.1006-9348.2013.02.069]
    [6]Li X D,Cao H,Huang L.Study about effect of relevant quantitative indexes of training set in text classification[J].Application Research of Computers,2014,31(11):3324-3327,3332.[李湘东,曹环,黄莉.文本分类中训练集相关数量指标的影响研究[J].计算机应用研究,2014,31(11):3324-3327,3332.][DOI:10.3969/j.issn.1001-3695.2014.11.028]
    [7]Li Q,Chen L.An improved text classification method for support vector machine[J].Computer Technology and Development,2015,25(5):78-82.[李琼,陈利.一种改进的支持向量机文本分类方法[J].计算机技术与发展,2015,25(5):78-82.][DOI:10.3969/j.issn.1673-629X.2015.05.019]
    [8]Fan H L,Cheng W Q.SO computer.An improved KNN approach of text classification based on association analysis[J].Computer Technology and Development,2014,24(6):71-74.[范恒亮,成卫青.一种基于关联分析的KNN文本分类方法[J].计算机技术与发展,2014,24(6):71-74.]
    [9]Zhou Z H.Machine Learning[M].Beijing:Tsinghua University Press,2016.[周志华.机器学习[M].北京:清华大学出版社,2016.]
    [10]Li Y,Liu Z D,Zhang H J.Review on ensemble algorithms for imbalanced data classification[J].Application Research of Computers,2014,31(5):1287-1291.[李勇,刘战东,张海军.不平衡数据的集成分类算法综述[J].计算机应用研究,2014,31(5):1287-1291.][DOI:10.3969/j.issn.1001-3695.2014.05.002]
    [11]Sun Y M,Kamel M S,Wong A K C,et al.Cost-sensitive boosting for classification of imbalanced data[J].Pattern Recognition,2007,40(12):3358-3378.[DOI:0.1016/j.patcog.2007.04.009]
    [12]Seiffert C,Khoshgoftaar T M,Van Hulse J,et al.RUSBoost:a hybrid approach to alleviating class imbalance[J].IEEE Transactions on Systems,Man,and Cybernetics-Part A:Systems and Humans,2010,40(1):185-197.[DOI:10.1109/TSMCA.2009.2029559]
    [13]Wang S,Yao X.Diversity analysis on imbalanced data sets by using ensemble models[C]//IEEE Symposium on Computational Intelligence and Data Mining.Nashville,TN,USA:IEEE,2009:324-331.[DOI:10.1109/CIDM.2009.4938667]
    [14]Liu X Y,Wu J X,Zhou Z H.Exploratory undersampling for class-imbalance learning[J].IEEE Transactions on Systems,Man,and Cybernetics,Part B(Cybernetics),2009,39(2):539-550.[DOI:10.1109/TSMCB.2008.2007853]
    [15]Li Q J,Mao Y B,Wang Z Q.Research on boosting-based imbalanced data classification[J].Computer Science,2011,38(12):224-228.[李秋洁,茅耀斌,王执铨.基于Boosting的不平衡数据分类算法研究[J].计算机科学,2011,38(12):224-228.][DOI:10.3969/j.issn.1002-137X.2011.12.050]
    [16]Dong C L,Schfer U.Ensemble-style self-training on citation classification[C]//Proceedings of the 5th International Joint Conference on Natural Language Processing.Chiang Mai,Thailand:Asian Federation of Natural Language Processing,2011:623-631.
    [17]Tang H L.Text Categorization Method Based on Semi-supervised and Ensemble Learning[M].Beijing:Electronic Industry Press,2013.[唐焕玲.基于半监督与集成学习的文本分类方法[M].北京:电子工业出版社,2013.]
    [18]Hu X G,Ma L W,Li P P.Data stream ensemble classification algorithm based on tri-training[J].Journal of Data Acquisition and Processing,2017,32(5):853-860.[胡学钢,马利伟,李培培.一种基于Tri-training的数据流集成分类算法[J].数据采集与处理,2017,32(5):853-860.][DOI:10.16337/j.1004-9037.2017.05.001]
    [19]Cai Y,Zhu X F,Sun Z L,et al.Semi-supervised and ensemble learning:a review[J].Computer Science,2017,44(6A):7-13.[蔡毅,朱秀芳,孙章丽,等.半监督集成学习综述[J].计算机科学,2017,44(6A):7-13.]

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700