数据挖掘在乳腺癌复发预测中的应用研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Application of data mining in breast cancer recurrence prediction
  • 作者:程国建 ; 张晗 ; 魏珺洁
  • 英文作者:CHENG Guojian;ZHANG Han;WEI Junjie;School of Computer Science,Xi'an Shiyou University;
  • 关键词:数据挖掘 ; 乳腺癌 ; C4.5算法 ; 朴素贝叶斯 ; SVM ; 十折交叉验证 ; 复发预测
  • 英文关键词:data mining;;breast cancer;;C4.5 algorithm;;Naive Bayes;;SVM;;ten-fold cross-validation;;recurrence prediction
  • 中文刊名:DLXZ
  • 英文刊名:Intelligent Computer and Applications
  • 机构:西安石油大学计算机学院;
  • 出版日期:2019-02-18
  • 出版单位:智能计算机与应用
  • 年:2019
  • 期:v.9
  • 语种:中文;
  • 页:DLXZ201902021
  • 页数:4
  • CN:02
  • ISSN:23-1573/TN
  • 分类号:104-107
摘要
乳腺癌是发生在人体乳腺上的恶性肿瘤,受某些因素的影响,乳腺癌术后会有复发的可能。乳腺癌术后复发不仅会加大乳腺癌的治疗难度,还会对患者的身心健康造成伤害。数据挖掘是知识发现的一个特定步骤,能够利用专门的算法从海量数据中抽取有用的知识。数据挖掘可以完成分类、聚类、预测、关联分析等任务,使用数据挖掘算法预测乳腺癌是否有复发的可能,将会对乳腺癌的治疗提供帮助。文章使用来自南斯拉夫卢布尔雅那大学医疗中心乳腺癌肿瘤研究所、由Zwitter和Soklic提供的乳腺癌数据,实验利用C4.5算法、朴素贝叶斯算法和SVM算法并使用十折交叉验证方法对该数据进行分类,进而预测乳腺癌是否有复发的可能。最后,文章对3种算法的预测结果进行综合分析,得到各个算法在乳腺癌复发预测中的优势和劣势。
        Data mining is a specific step in knowledge discovery. It can use specialized algorithms to extract useful knowledge from massive data. Breast cancer is a malignant tumor that occurs in the breast. Due to certain factors,breast cancer may have a recurrence after surgery. Postoperative recurrence of breast cancer will not only increase the difficulty of treatment of breast cancer,but also cause damage to the physical and mental health of patients. Data mining can complete tasks such as classification,clustering,prediction,and association analysis. Using data mining algorithms to predict whether breast cancer has recurrence may help breast cancer treatment. The breast cancer data of this article is acquired from the Breast Cancer Research Institute at the University of Ljubljana Medical Center in Yugoslavia,provided by Zwitter and Soklic. The article uses C4. 5 algorithm,naive Bayesian and SVM with a ten-fold cross-validation method algorithm to classify the data and predict whether breast cancer has recurrence. Finally,the article comprehensively analyzes the prediction results of the three algorithms,and obtains the advantages and disadvantages of each algorithm in breast cancer recurrence prediction.
引文
[1]王惠中,彭安群.数据挖掘研究现状及发展趋势[J].工矿自动化,2011,37(2):29-32.
    [2]苏亚丁.基于决策树的数据挖掘技术在口腔诊疗中的应用[D].石家庄:河北科技大学,2010.
    [3]谢江林,何宜庆,陈涛.数据挖掘在供应链金融风险控制中的应用[J].南昌大学学报(理科版),2008,32(3):278-281.
    [4]耿亮,吴燕,孟宪楠.电力数据挖掘在电网内部及各领域间的应用[J].电信科学,2013,29(11):127-130.
    [5]张忠清,李广灿,叶召.乳腺癌当前流行趋势分析[J].中国肿瘤,2000,9(10):454-455.
    [6]贾宝洋,李海斌.乳腺癌复发转移的相关因素分析[J].现代预防医学,2009,36(22):4377-4378.
    [7]高海宾.基于Weka平台的决策树J48算法实验研究[J].湖南理工学院学报(自然科学版),2017,30(1):21-25.
    [8]杨小军,钱鲁锋,别致.基于WEKA平台的决策树算法比较研究[J].舰船电子工程,2018,38(10):34-36,97.
    [9]MICHALSKI R S,MOZETIC I,HONG Jiarong,et al.The multi-purpose incremental learning system AQ15 and its testing application to three medical domains[C]//Proceedings of the 5th National Conference on Artificial Intelligence.Philadelphia,PA:AAAI Press,1986:1041-1045.
    [10]CLARK P,NIBLETT T.Induction in noisy domains.in progress in machine learning[C]//the Proceedings of the 2ndEuropean Working Session on Learning.Bled,Yugoslavia:Sigma Press,1987:11-30.
    [11]TAN M,ESHELMAN L.Using weighted networks to represent classification knowledge in noisy domains[C]//Proceedings of the Fifth International Conference on Machine Learning.Ann Arbor,Michigan:Morgan Kaufmann,1988:121-134.
    [12]CESTNIK G,KONONENKO I,BRATKO I.Assistant-86:A knowledge-elicitation tool for sophisticated users[C]//Proceedings of the Second European Working Session on Learning.Bled,Yugoslavia:Sigma Press,1987:31-45.
    [13]周云辉,王娇.数据挖掘技术在医疗领域中的应用研究[J].机械工程与自动化,2013(4):14-15,18.
    [14]束建华.基于WEKA平台的分类预测模型分析[J].蚌埠学院学报,2013,2(2):26-28.
    [15]范永东.模型选择中的交叉验证方法综述[D].太原:山西大学,2013.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700