基于代价敏感主动学习算法的2型糖尿病诊断
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Diagnosis of Type 2 Diabetes Based on Cost-sensitive Active Learning Algorithm
  • 作者:许智彪
  • 英文作者:XU Zhi-biao;School of Electronic Information and Electrical Engineering,Shanghai Jiao Tong University;
  • 关键词:糖尿病 ; 诊断模型 ; 代价敏感分类 ; 主动学习 ; 逻辑回归 ; 支持向量机 ; 人工神经网络
  • 英文关键词:diabetes;;diagnostic model;;cost-sensitive classification;;active learning;;logistic regression;;support vector machine;;artificial neural network
  • 中文刊名:JYXH
  • 英文刊名:Computer and Modernization
  • 机构:上海交通大学电子信息与电气工程学院;
  • 出版日期:2018-06-15
  • 出版单位:计算机与现代化
  • 年:2018
  • 期:No.274
  • 语种:中文;
  • 页:JYXH201806018
  • 页数:7
  • CN:06
  • ISSN:36-1137/TP
  • 分类号:88-94
摘要
建立2型糖尿病诊断模型,并通过主动学习解决医疗数据中标记样本较少的问题。2型糖尿病的诊断可以被看作一个代价敏感的二分类问题,本文基于逻辑回归模型、支持向量机模型和人工神经网络模型,采用基于期望误差减小的代价敏感主动学习方法,将主动学习算法和代价敏感分类算法相结合来构建诊断模型,将不同的误分类代价考虑到样本的选择中。在2型糖尿病诊断问题中,基于期望误差减小的代价敏感主动学习算法表现最优,以较少的样本标记达到了最低的误分类代价,因此主动学习算法能够减少医疗数据挖掘中需要标记的样本数,节省标注成本,同时保证模型的性能。
        In this study,a diagnosis model for type 2 diabetes was built and the label absence problem in medical data was solved by active learning. The diagnosis of type 2 diabetes can be seen as a cost-sensitive binary classification task. Taking logistic regression,support vector machines(SVM) and artificial neural network(ANN) as the base model,this study adopted the costsensitive active learning algorithm based on the expected error reduction framework,which combined the active learning strategy with the cost-sensitive classification algorithm and introduced the cost information into the instance sampling process. For the diagnosis of type 2 diabetes,the cost-sensitive active learning algorithm based on the expected error reduction framework performed best in these compared active learning strategies and it achieved the minimum misclassification costs by labeling fewer instances.Active learning algorithms can reduce the number of instances to be labeled,save the labeling costs and guarantee the model performance at the same time.
引文
[1]Nathan D M,Buse J B,Davidson M B,et al.Medical management of hyperglycemia in type 2 diabetes:A consensus algorithm for the initiation and adjustment of therapy[J].Clinical Diabetes,2009,27(1):4-16.
    [2]Whiting D R,Guariguata L,Weil C,et al.IDF diabetes atlas:Global estimates of the prevalence of diabetes for2011 and 2030[J].Diabetes Research and Clinical Practice,2011,94(3):311-321.
    [3]Norris S L,Kansagara D,Bougatsos C,et al.Screening adults for type 2 diabetes:A review of the evidence for the U.S.Preventive Services Task Force[J].Annals of Internal Medicine,2008,148(11):855-868.
    [4]Detrano R,Janosi A,Steinbrunn W,et al.International application of a new probability algorithm for the diagnosis of coronary artery disease[J].The American Journal of Cardiology,1989,64(5):304-310.
    [5]Gamboa A L G,Mendoza M G,Orozco R E I,et al.Hybrid fuzzy-SV clustering for heart disease identification[C]//IEEE International Conference on Computational Intelligence for Modeling,Control and Automation and International Conference on Intelligent Agents,Web Technologies and Internet Commerce.2006:121.
    [6]Kahramanli H,Allahverdi N.Extracting rules for classification problems:AIS based approach[J].Expert Systems with Applications,2009,36(7):10494-10502.
    [7]Cascio D,Fauci F,Magro R,et al.Mammogram segmentation by contour searching and mass lesions classification with neural network[J].IEEE Transactions on Nuclear Science,2006,53(5):2827-2833.
    [8]Settles B.Active Learning Literature Survey[R].University of Wisconsin-Madison,Computer Science Report1648.2009.
    [9]Zhang Yexun,Wang Yanfeng,Cai Wenbin,et al.From theory to practice:Efficient active cost-sensitive classification with expected error reduction[C]//Proceedings of the2017 SIAM International Conference on Data Mining.Society for Industrial and Applied Mathematics.2017:153-161.
    [10]Liu A,Jun G,Ghosh J.A self-training approach to cost sensitive uncertainty sampling[J].Machine Learning,2009,76(2-3):257-270.
    [11]Lewis D D,Gale W A.A sequential algorithm for training text classifiers[C]//Proceedings of the 17th Annual International ACM SIGIR Conference on Research and Development in Information Retrieval.1994:3-12.
    [12]Schein A I,Ungar L H.Active learning for logistic regression:An evaluation[J].Machine Learning,2007,68(3):235-265.
    [13]Long Bo,Chapelle O,Zhang Ya,et al.Active learning for ranking through expected loss optimization[J].IEEE Transactions on Knowledge and Data Engineering,2015,27(5):1180-1191.
    [14]Lopez V,Fernandez A,Moreno-Torres J G,et al.Analysis of preprocessing vs.cost-sensitive learning for imbalanced classification.Open problems on intrinsic data characteristics[J].Expert Systems with Applications,2012,39(7):6585-6608.
    [15]Hastie T,Tibshirani R,Friedman J.The Elements of Statistical Learning:Data Mining,Inference and Prediction[M].2nd ed.Springer,2009.
    [16]Baxt W G.Use of an artificial neural network for the diagnosis of myocardial infarction[J].Annals of Internal Medicine,1991,115(11):843-848.
    [17]Roy N,Mccallum A.Toward optimal active learning through sampling estimation of error reduction[C]//Proceedings of the 8th International Conference on Machine Learning.2001:441-448.
    [18]Elkan C.The foundations of cost-sensitive learning[C]//International Joint Conference on Artificial Intelligence,2001.2001:973-978.
    [19]Cai Wenbin,Zhang Ya,Zhou Siyuan,et al.Active learning for support vector machines with maximum model change[C]//Joint European Conference on Machine Learning and Knowledge Discovery in Databases.2014:211-226.
    [20]Zhou Siyuan,Zhang Ya.Active learning for cost-sensitive classification using logistic regression model[C]//IEEE International Conference on Big Data Analysis.2016:1-4.
    [21]Domingos P.Metacost:A general method for making classifiers cost-sensitive[C]//Proceedings of the 5th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining.1999:155-164.
    [22]Wang Tao,Qin Zhenxing,Zhang Shicao,et al.Cost-sensitive classification with inadequate labeled data[J].Information Systems,2012,37(5):508-516.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700