基于不平衡分类的乳腺肿瘤预后预测方法的研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Prognosis prediction method of breast tumor based on unbalanced classification
  • 作者:王哲 ; 杨日东 ; 周毅 ; 张学良 ; 王凯
  • 英文作者:WANG Zhe;YANG Ridong;ZHOU Yi;ZHANG Xueliang;WANG Kai;School of Public Health, Xinjiang Medical University;Zhongshan School of Medicine,Sun Yat-Sen University;School of Medical Engineering and Technology, Xinjiang Medical University;
  • 关键词:乳腺肿瘤 ; 类不平衡 ; 机器学习 ; 敏感度 ; 采样技术
  • 英文关键词:breast cancer;;imbalance;;machine learning;;sensitivity;;sampling technique
  • 中文刊名:XJYY
  • 英文刊名:Journal of Xinjiang Medical University
  • 机构:新疆医科大学公共卫生学院;中山大学中山医学院;新疆医科大学医学工程技术学院;
  • 出版日期:2019-04-15
  • 出版单位:新疆医科大学学报
  • 年:2019
  • 期:v.42
  • 基金:国家自然科学基金(61876194);; 国家重点研发计划(2018YFC0116902)
  • 语种:中文;
  • 页:XJYY201904028
  • 页数:4
  • CN:04
  • ISSN:65-1204/R
  • 分类号:131-134
摘要
目的探讨不平衡分类的乳腺癌数据集基于机器学习方法预测预后的生存状态。方法乳腺肿瘤预后的生存状态数据为不平衡数据,针对不平衡数据这一问题,本文使用SMOTE、Borderline-SMOTE、ADASYN、One-Sided Select处理乳腺肿瘤生存状态的不平衡数据。然后通过经典决策树、条件决策树、随机森林、支持向量机预测的准确率、敏感度、特异性、正例命中率、负例命中率来评价分类器的效果。结果 4种机器学习方法进行乳腺癌预后预测时,未经采样技术处理的原始数据集在预测准确率上均表现良好,其中支持向量机准确率最高,达90.42%。使用欠采样方法One-Sided Select技术,结合条件决策树预测,在不平衡的乳腺肿瘤数据集中预后预测效果最好,将敏感度由2%提高到58%,提高了56%。支持向量机在预测未经处理的数据集时特异性最高,为100%。采用One-Sided Select结合支持向量机算法的正例命中率最高,为40%。采用One-Sided Select结合条件决策树的负例命中率最高,为95%。结论对类不平衡数据预处理后可以较好的改善敏感度,通过对比发现采样技术中的One-Sided Select更适合于乳腺癌的预后模型。
        Objective To investigate the prognostic survival status of unbalanced classification breast cancer datasets based on machine learning. Methods The prognostic survival data of breast tumors were unbalanced data. To solve the problem of unbalanced data, we processed the unbalanced survival data of breast tumors by using SMOTE, Borderline-SMOTE, ADASYN and One-Sided Select. Then the effect of classifier was evaluated by accuracy, sensitivity, specificity, positive hit rate and negative hit rate, which were predicted by classical decision tree, conditional decision tree, random forest, support vector machine. Results When using four machine learning methods to predict the prognosis of breast cancer, the original data set without sampling technology was performed well, and its prediction accuracy was best(90.42%). Using one-Sided Select technology and conditional decision tree prediction, the prognosis prediction effect was the best in unbalanced breast cancer data set, which increased the sensitivity from 2% to 58% and 56%. Support vector machine had the highest specificity in predicting unprocessed data sets, which was 100%. One-Sided Select combined with Support Vector Machine(SVM) had the highest hit rate of 40%. One-Sided Select combined with conditional decision tree had the highest negative hit rate(95%). Conclusion Pretreatment of quasi-unbalanced data can improve the sensitivity of breast cancer, and one-sided selection in sampling technology is more suitable for breast cancer prognostic model.
引文
[1] ALKAWAA F M, CHAUDHARY K, GARMIRE L X. Deep learning accurately predicts estrogen receptor status in breast cancer metabolomics data[J]. J Prote Res, 2018, 17(1):337-347.
    [2] 王俊男,徐拯,林健,等.乳腺癌肝转移患者的临床病理特征及预后因素分析:基于SEER数据库的回顾性研究[J]. 中华乳腺病杂志(电子版), 2018, 12(4):202.
    [3] 李江,唐威,王昕,等.乳腺癌筛查领域的系统评价再评价[J]. 中国肿瘤, 2018, 27(6):401-408.
    [4] YANG X L, DAVID L O, XIA X, et al. High-impact bug report identification with imbalanced learning strategies[J]. J Comp Sci Technol, 2017,32(1):181-198.
    [5] REN J X, GONG Y, LING H, et al. Racial/ethnic differences in the outcomes of patients with metastatic breast cancer: contributions of demographic, socioeconomic, tumor and metastatic characteristics[J]. Breast Cancer Res Treatment, 2018(1):1-13.
    [6] ZHANG S. Multiple-scale cost sensitive decision tree learning[J]. World Wide Web, 2018, 21(5):1-14.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700