基于误分类模式的乳腺癌诊断研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Research on breast cancer diagnosis based on the pattern of misclassification
  • 作者:高集荣 ; 田艳 ; 杨永红 ; 刘清华
  • 英文作者:Gao Jirong;Tian Yan;Yang Yonghong;Liu Qinghua;Department of Computer Science ,SUN YAT-SEN University;Institute of Statistics,Xi'an University of Finance and Economics;
  • 关键词:数据挖掘 ; 代价敏感 ; 误分类代价 ; 乳腺癌
  • 英文关键词:data mining;;cost sensitive;;misclassification cost;;breast cancer
  • 中文刊名:WXJY
  • 英文刊名:Microcomputer & Its Applications
  • 机构:中山大学计算机科学系;西安财经学院统计学院;
  • 出版日期:2017-02-14 14:52
  • 出版单位:微型机与应用
  • 年:2017
  • 期:v.36;No.466
  • 语种:中文;
  • 页:WXJY201702004
  • 页数:5
  • CN:02
  • ISSN:11-5881/TP
  • 分类号:14-17+20
摘要
乳腺癌已经成为当今世界影响妇女健康的重要疾病。对于乳腺癌诊断来说,当一个恶性病例被误分类为良性病例的时候,其代价远远大于一个良性病例被误分类为恶性病例。它利用数据挖掘领域的代价敏感相关方法,建立一个识别良性乳腺肿瘤和恶性乳腺肿瘤的诊断预测系统。在建模过程中充分考虑到误分类代价的因素,提出了误分类代价策略。通过一系列实验验证了所建立的模型。从实验结果来看,Adaboost与SVM的误分类组合分类算法在正确率和总误分类代价两个评估指标上得到了良好的效果。
        Breast cancer has become an important disease affecting women's health in today's world. The cost of a malignant case mistakenly being classified as a benign one is far greater than the cost of a benign case being wrongly classified as a malignant one. Through cost-sensitive method in the field of data mining,a diagnosis predictive system which can distinguish the benign breast tumor from the malignant tumor will be built. During the process of pattern building,the misclassification cost factors are taken into consideration and the corresponding strategies are put forward. This pattern will be validated through a series of experiments whose findings show that Adaboost combined with classification algorithm of SVM based on error classification can obtain good results in the two evaluation indexes,accuracy and total misclassification cost.
引文
[1]The Women’s Health Resource.What is breast cancer[EB/OL].(2013-06-10)[2016-07-28].http://www.imaginis.com/general info-rmation-on-breast-cancer/what-is-breast-cancer-2.
    [2]UCI Machine Learning Repository.Wisconsin breast cancer dataset[EB/OL].(2012-06-30)[2016-07-28]http://archive.ics.uci.edu/ml/datasets.html?format=&task=cla&att=&area=&num Att=&num Ins=&type=&sort=.
    [3]姚旭.特征选择方法综述[J].控制与决策.2012,127(2):35-40.
    [4]TURNEY P D.Types of cost in inductive concept learning[C].Workship on Cost-Sensitive Learning at ICML,2000:15-21.
    [5]DUPRET,G.KODA,M.Theory and methodology:boostrap resampling for unbalanced data in supervised learning[J].Eropean Journal of Operational Research,2001,134(1),141-156.
    [6]GOOD,P.I.Resampling methods:a practical guide to data analysis(3rd Edition)[M].Birkhauser,2006.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700