用户名: 密码: 验证码:
通过训练样本采样处理改善小宗作物遥感识别精度
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Improvement in recognition accuracy of minority crops by resampling of imbalanced training datasets of remote sensing
  • 作者:樊东东 ; 李强 ; 王红岩 ; 张源 ; 杜鑫 ; 沈宇
  • 英文作者:FAN Dongdong;LI Qiangzi;WANG Hongyan;ZHANG Yuan;DU Xin;SHEN Yu;Institute of Remote Sensing and Digital Earth, Chinese Academy of Sciences;College of Resources and Environmental Sciences of University of Chinese Academy of Sciences;
  • 关键词:作物识别 ; 不均衡数据集 ; 采样 ; 遥感 ; 小宗作物 ; (GF-2)高分二号
  • 英文关键词:crops recognition;;imbalanced datasets;;resampling;;remote sensing;;minority crops;;GF-2
  • 中文刊名:YGXB
  • 英文刊名:Journal of Remote Sensing
  • 机构:中国科学院遥感与数字地球研究所;中国科学院大学资源与环境学院;
  • 出版日期:2019-07-25
  • 出版单位:遥感学报
  • 年:2019
  • 期:v.23
  • 基金:国家自然科学基金(编号:41571422)~~
  • 语种:中文;
  • 页:YGXB201904014
  • 页数:13
  • CN:04
  • ISSN:11-3841/TP
  • 分类号:168-180
摘要
训练样本质量是决定农作物遥感识别精度的关键因素,虽然高空间分辨率卫星的发展有效地解决了农作物遥感识别过程中的混合像元问题,但是当区域内不同作物种植面积差异较大时,训练集中不同类别样本数量往往相差较大,这样的不均衡数据集影响分类器的训练,导致少数类别的识别精度不理想。为研究作物遥感识别过程中的不均衡样本问题,本文基于GF-2号卫星数据,首先挖掘了地物的光谱信息、纹理信息,用特征递归消除RFE (Recursive Feature Elimination)方法进行特征优选,然后从数据处理的角度采用了5种采样算法对不均衡训练集进行处理,最后使用采样后的均衡数据集训练分类器,对比数据采样前后决策树与Adaboost(Adaptive Boosting)两种分类器的识别结果,发现:(1)经过采样处理后两种分类算法明显提升了小宗作物的分类精度;(2)经过ADASYS (Adaptive synthetic sampling)采样处理后,分类器性能提升最多,决策树的Kappa系数提高了14.32%,Adaboost的Kappa系数提高了10.23%,达到最高值0.9336;(3)过采样的处理效果优于欠采样,过采样对分类器的性能提升更多。综上所述,选择合适的采样方法和分类方法是提高不均衡数据集遥感分类精度的有效途径。
        The rapid development of high-spatial-resolution satellites has effectively alleviated the problem of mixed pixels in satellite images, thereby enabling extraction of the meticulous distribution of crops from them. The classification of remote sensing images is a quick way to obtain accurate agricultural information. However, the accuracy of supervised classification using remote sensing images is affected by several factors, such as classifier algorithm and input datasets. The imbalanced training samples, which indicates the number of training samples of some categories is considerably smaller or larger than the others, often results in poor classification accuracy for the minority classes. To improve this situation and generalization performance of classifier, this research focused on proper utilization of resampling techniques and classification methodologies for achieving perfect performance of remote sensing image classification.We investigated the aforementioned images by data mining approaches including spectrum and texture features and selection of optimized features based on recursive feature elimination. Then, five resample methods, namely, three over-resampling methods and two undersampling methods, were separately used to balance the initial training datasets. Finally, we tested the resampled datasets by utilizing two classifiers(decision tree and AdaBoost) and evaluated the performance of each one in terms of kappa coefficient, overall accuracy,producer's accuracy, and user's accuracy.The overall classification accuracy and kappa coefficient improved considerably on decision tree(14.32%) and AdaBoost classifier(10.23%) after resampling. The AdaBoost obtained the highest value of kappa coefficient(0.9336) by using the training dataset resampled with ADASYN. The accuracy of classification on minority crops was also increased by resampling training datasets. Meanwhile, feature selection results showed that vegetation and texture indexes were more efficient than features of original reflection ratio to classification. Overresampling methods had advantages in relieving the influence of imbalanced training samples to classifiers.Resampling process to training datasets has remarkable advantage in improving the classifier performance if the training datasets are critically imbalanced. The detailed accuracy assessment shows that over-resampling method is more excellent than under-resampling. The reason is that some significant samples are lost during under-resampling, but helpful and useful information is added after over-resampling.AdaBoost classifier performs better than decision tree in terms of solving imbalanced training datasets. Combination of proper resampling approaches and compatible classifier can significantly improve the accuracy of minority classes in the situation of imbalanced dataset classification.
引文
Arenas-Toledo J M and Epiphanio J C N.2011.Harmonic amplitudeterms mask to highlight agriculture in the savanna domain below the Brazilian Amazonian frontier.International Journal of Remote Sensing,32(18):5021-5034[DOI:10.1080/01431161.2010.495096]
    Cao Y,Miao Q G,Liu J C and Gao L.2013.Advance and prospects of AdaBoost algorithm.Acta Automatica Sinica,39(6):745-758(曹莹,苗启广,刘家辰,高琳.2013.AdaBoost算法研究进展与展望.自动化学报,39(6):745-758)[DOI:10.3724/SP.J.1004.2013.00745]
    Chawla N V,Bowyer K W,Hall L O and Kegelmeyer W P.2002.SMOTE:synthetic minority over-sampling technique.Journal of Artificial Intelligence Research,16(1):321-357[DOI:10.1613/jair.953]
    Chen C F,Chen C R and Son N T.2012.Investigating rice cropping practices and growing areas from MODIS data using empirical mode decomposition and support vector machines.Giscience and Remote Sensing,49(1):117-138[DOI:10.2747/1548-1603.49.1.117]
    Cutler D R,Edwards T C Jr,Beard K H,Cutler A,Hess K T,Gibson Jand Lawler J J.2007.Random forests for classification in ecology.Ecology,88(11):2783-2792[DOI:10.1890/07-0539.1]
    Ding X.2014.Study on Distribution of Crop’s Structure in Heilongjiang.Harbin:Northeast Agricultural University(丁潇.2014.黑龙江省农作物种植结构布局研究.哈尔滨:东北农业大学)
    Elhassan A T,Aljourf M,Al-Mohanna F and Shoukri M.2017.Classification of imbalance data using tomek link(T-Link)combined with random under-sampling(RUS)as a data reduction method.Global Journal of Technology and Optimization(S1):111[DOI:10.4172/2229-8711.S1111]
    Foody G M and Mathur A.2006.The use of small training sets containing mixed pixels for accurate hard image classification:training on mixed spectral responses for classification by a SVM.Remote Sensing of Environment,103(2):179-189[DOI:10.1016/j.rse.2006.04.001]
    García V,Sánchez J S and Mollineda R A.2011.Classification of high dimensional and imbalanced hyperspectral imagery data//Proceedings of the 5th Iberian Conference on Pattern Recognition and Image Analysis.Las Palmas de Gran Canaria,Spain:Springer[DOI:10.1007/978-3-642-21257-4_80]
    Haboudane D,Miller J R,Tremblay N,Zarco-Tejada P J and Dextraze L.2002.Integrated narrow-band vegetation indices for prediction of crop chlorophyll content for application to precision agriculture.Remote Sensing of Environment,81(2/3):416-426[DOI:10.1016/S0034-4257(02)00018-4]
    Han H,Wang W Y and Mao B H.2005.Borderline-SMOTE:a new over-sampling method in imbalanced data sets learning//Huang DS,Zhang X P and Huang G B,eds.Advances in Intelligent Computing.Berlin Heidelberg:Springer[DOI:10.1007/11538059_91]
    Haralick R M,Shanmugam K and Dinstein I.1973.Textural features for image classification.IEEE Transactions on Systems,Man,and Cybernetics,SMC-3(6):610-621[DOI:10.1109/TSMC.1973.4309314]
    Harris R.2003.Remote sensing of agriculture change in Oman.International Journal of Remote Sensing,24(23):4835-4852[DOI:10.1080/0143116031000068178]
    He H B,Bai Y,Garcia E A and Li S T.2008.ADASYN:adaptive synthetic sampling approach for imbalanced learning//Proceedings of2008 IEEE International Joint Conference on Neural Networks.Hong Kong,China:IEEE[DOI:10.1109/IJCNN.2008.4633969]
    Hixson M,Scholz D,Fuhs N and Akiyama T.1980.Evaluation of several schemes for classification of remotely sensed data.Photogrammetric Engineering and Remote Sensing,46(12):1547-1553
    Hu Q,Wu W B,Song Q,Yu Q Y,Yang P and Tang H J.2015.Recent progresses in research of crop patterns mapping by using remote sensing.Scientia Agricultura Sinica,48(10):1900-1914(胡琼,吴文斌,宋茜,余强毅,杨鹏,唐华俊.2015.农作物种植结构遥感提取研究进展.中国农业科学,48(10):1900-1914)[DOI:10.3864/j.issn.0578-1752.2015.10.004]
    Huang D S.2011.Research on Feature Selection and Semi-Supervised Classification.Wuhan:Huazhong University of Science and Technology(黄东山.2011.特征选择及半监督分类方法研究.武汉:华中科技大学)
    Jia K and Li Q Z.2013.Review of features selection in crop classification using remote sensing data.Resources Science,35(12):2507-2516(贾坤,李强子.2013.农作物遥感分类特征变量选择研究现状与展望.资源科学,35(12):2507-2516)
    Laurikkala J.2001.Improving identification of difficult small classes by balancing class distribution//Quaglini S,Barahona P and Andreassen S,eds.Artificial Intelligence in Medicine.Berlin Heidelberg:Springer[DOI:10.1007/3-540-48229-6_9]
    Li Q Z and Wu B F.2004.Accuracy assessment of planted area proportion using Landsat TM imagery.Journal of Remote Sensing,8(6):581-587(李强子,吴炳方.2004.作物种植成数的遥感监测精度评价.遥感学报,8(6):581-587)[DOI:10.11834/jrs.20040607]
    Liu J,Wang L M,Yang F G,Yang L B and Wang X L.2015.Remote sensing estimation of crop planting area based on HJ time-series images.Transactions of the Chinese Society of Agricultural Engineering,31(3):199-206(刘佳,王利民,杨福刚,杨玲波,王小龙.2015.基于HJ时间序列数据的农作物种植面积估算.农业工程学报,31(3):199-206)[DOI:10.3969/j.issn.1002-6819.2015.03.026]
    Liu K B,Liu S B,Lu Z J,Song Q,Liu Y X,Zhang D M and Wu W B.2014.Extraction on cropping structure based on high spatial resolution remote sensing data.Chinese Journal of Agricultural Resources and Regional Planning,35(1):21-26(刘克宝,刘述彬,陆忠军,宋茜,刘艳霞,张冬梅,吴文斌.2014.利用高空间分辨率遥感数据的农作物种植结构提取.中国农业资源与区划,35(1):21-26)[DOI:10.7621/cjarrp.1005-9121.20140104]
    Liu X N,Li X H,Sun D F,Li H,Zhang W W and Zhou L D.2011.Landscape extraction and corridor site assessment of farmland in urban fringe using SPOT5 remote sensing image.Transactions of the CSAE,27(4):317-323(刘晓娜,李宪海,孙丹峰,李红,张微微,周连第.2011.SPOT5遥感影像城郊耕地景观提取与廊道立地分析.农业工程学报,27(4):317-323)[DOI:10.3969/j.issn.1002-6819.2011.04.055]
    Mathur A and Foody G M.2008.Crop classification by support vector machine with intelligently selected training data for an operational application.International Journal of Remote Sensing,29(8):2227-2240[DOI:10.1080/01431160701395203]
    Metternicht G.2003.Vegetation indices derived from high-resolution airborne videography for precision crop management.International Journal of Remote Sensing,24(14):2855-2877[DOI:10.1080/01431160210163074]
    Murthy C S,Raju P V and Badrinath K V S.2003.Classification of wheat crop with multi-temporal images:performance of maximum likelihood and artificial neural networks.International Journal of Remote Sensing,24(23):4871-4890[DOI:10.1080/0143116031000070490]
    Punera K and Ghosh J.2008.Consensus-based ensembles of soft clusterings.Applied Artificial Intelligence,22(7/8):780-810[DOI:10.1080/08839510802170546]
    R?tsch G,Onoda T and Müller K R.2001.Soft margins for AdaBoost.Machine Learning,42(3):287-320[DOI:10.1023/A:1007618119488]
    Rilwani M L and Ikhuoria I A.2011.Prospects for geoinformaticsbased precision farming in the Savanna River basin,Nigeria.International Journal of Remote Sensing,32(12):3539-3549[DOI:10.1080/01431161.2010.523022]
    Sarkar A,Majumdar A,Chatterjee S,Chatterjee D,Ray S S and Kartikeyan B.2008.Study of the potential of alternative crops by integration of multisource data using a neuro-fuzzy technique.International Journal of Remote Sensing,29(19):5479-5493[DOI:10.1080/01431160802007665]
    Shukla G,Garg R D,Srivastava H S and Garg P K.2018.Performance analysis of different predictive models for crop classification across an aridic to ustic area of Indian states.Geocarto International,33(3):240-259[DOI:10.1080/10106049.2016.1240721]
    Sonobe R,Tani H,Wang X F,Kobayashi N and Shimamura H.2014.Parameter tuning in the support vector machine and random forest and their performances in cross-and same-year crop classification using TerraSAR-X.International Journal of Remote Sensing,32(23):7898-7909[DOI:10.1080/01431161.2014.978038]
    Tan C P,Ewe H T and Chuah H T.2011.Agricultural crop-type classification of multi-polarization SAR images using a hybrid entropy decomposition and support vector machine technique.International Journal of Remote Sensing,32(22):7057-7071[DOI:10.1080/01431161.2011.613414]
    Tumer K and Oza N C.2003.Input decimated ensembles.Pattern Analysis and Applications,6(1):65-77[DOI:10.1007/s10044-002-0181-7]
    Waske B,Benediktsson J A and Sveinsson J R.2009.Classifying remote sensing data with support vector machines and imbalanced training data//Proceedings of the 8th International Workshop on Multiple Classifier Systems.Reykjavik,Iceland:Springer[DOI:10.1007/978-3-642-02326-2_38]
    Wilson D L.1972.Asymptotic properties of nearest neighbor rules using edited data.IEEE Transactions on Systems,Man,and Cybernetics,SMC-2(3):408-421[DOI:10.1109/TSMC.1972.4309137]
    Wu B F,Fan J L,Tian Y C,Li Q Z,Zhang L,Liu Z L,Zhang G L,He L H,Huang J L,Jiang X B,Yan C Z,Xu A and Zhang W Q.2004a.A method for crop planting structure inventory and its application.Journal of Remote Sensing,8(6):618-627(吴炳方,范锦龙,田亦陈,李强子,张磊,刘兆礼,张广录,何隆华,黄进良,江晓波,颜长珍,许安,张维奇.2004a.全国作物种植结构快速调查技术与应用.遥感学报,8(6):618-627)[DOI:10.11834/jrs.20040612]
    Wu B F,Xu W B,Sun M,Li Q Z and Huang H P.2004b.QuickBird imagery for crop pattern mapping.Journal of Remote Sensing,8(6):688-695(吴炳方,许文波,孙明,李强子,黄慧萍.2004b.高精度作物分布图制作.遥感学报,8(6):688-695)[DOI:10.11834/jrs.20040620]
    Wu J P and Yang X W.1996.Purification of training samples in supervised classification of remote sensing data.Remote Sensing for Land and Resources,8(1):36-41(吴健平,杨星卫.1996.遥感数据监督分类中训练样本的纯化.国土资源遥感,8(1):36-41)[DOI:10.6046/gtzyyg.1996.01.07]
    Zhao Y S.2003.The Principle and Method of Analysis of Remote Sensing Application.Beijing:Science Press(赵英时.2003.遥感应用分析原理与方法.北京:科学出版社)
    Zhu X F,Pan Y Z,Zhang J S,Wang S,Gu X H and Xu C.2007.The effects of training samples on the wheat planting area measure accuracy in TM scale(Ⅰ):the accuracy response of different classifiers to training samples.Journal of Remote Sensing,11(6):826-837(朱秀芳,潘耀忠,张锦水,王双,顾晓鹤,徐超.2007.训练样本对TM尺度小麦种植面积测量精度影响研究(Ⅰ)--训练样本与分类方法间分类精度响应关系研究.遥感学报,11(6):826-837)[DOI:10.11834/jrs.200706112]

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700