摘要
乳腺癌已经成为当今世界影响妇女健康的重要疾病。对于乳腺癌诊断来说,当一个恶性病例被误分类为良性病例的时候,其代价远远大于一个良性病例被误分类为恶性病例。它利用数据挖掘领域的代价敏感相关方法,建立一个识别良性乳腺肿瘤和恶性乳腺肿瘤的诊断预测系统。在建模过程中充分考虑到误分类代价的因素,提出了误分类代价策略。通过一系列实验验证了所建立的模型。从实验结果来看,Adaboost与SVM的误分类组合分类算法在正确率和总误分类代价两个评估指标上得到了良好的效果。
Breast cancer has become an important disease affecting women's health in today's world. The cost of a malignant case mistakenly being classified as a benign one is far greater than the cost of a benign case being wrongly classified as a malignant one. Through cost-sensitive method in the field of data mining,a diagnosis predictive system which can distinguish the benign breast tumor from the malignant tumor will be built. During the process of pattern building,the misclassification cost factors are taken into consideration and the corresponding strategies are put forward. This pattern will be validated through a series of experiments whose findings show that Adaboost combined with classification algorithm of SVM based on error classification can obtain good results in the two evaluation indexes,accuracy and total misclassification cost.
引文
[1]The Women’s Health Resource.What is breast cancer[EB/OL].(2013-06-10)[2016-07-28].http://www.imaginis.com/general info-rmation-on-breast-cancer/what-is-breast-cancer-2.
[2]UCI Machine Learning Repository.Wisconsin breast cancer dataset[EB/OL].(2012-06-30)[2016-07-28]http://archive.ics.uci.edu/ml/datasets.html?format=&task=cla&att=&area=&num Att=&num Ins=&type=&sort=.
[3]姚旭.特征选择方法综述[J].控制与决策.2012,127(2):35-40.
[4]TURNEY P D.Types of cost in inductive concept learning[C].Workship on Cost-Sensitive Learning at ICML,2000:15-21.
[5]DUPRET,G.KODA,M.Theory and methodology:boostrap resampling for unbalanced data in supervised learning[J].Eropean Journal of Operational Research,2001,134(1),141-156.
[6]GOOD,P.I.Resampling methods:a practical guide to data analysis(3rd Edition)[M].Birkhauser,2006.