Logistic回归、决策树和神经网络在预测2型糖尿病并发末梢神经病变中的性能比较

英文题名：A Performance Comparison between Logistic Regression, Decision Trees, and Neural Networks in Predictiving Peripheral Neuropathy in Type 2 Diabetes Mellitus
作者：李长平
论文级别：博士
学科专业名称：流行病与卫生统计学
中文关键词：Logistic回归 ; 决策树 ; 神经网络 ; 保持法 ; 交叉验证法 ; 刀切法 ; 自引导法 ; 过抽样 ; 糖尿病并发末梢神经病变
英文关键词：logistic regression ; decision tree ; neural network ; holdout method ; cross-validation method ; jacknife method ; bootstrap method ; oversampling ; Diabetic Peripheral Neuropathy
学位年度：2009
导师：胡良平
学科代码：100401
学位授予单位：中国人民解放军军事医学科学院
论文提交日期：2009-05-30

摘要

近年来,数学方法和计算机技术的发展使复杂的模型预测成为可能。目前能够建立预测模型的方法主要有统计学方法和数据挖掘方法,基于这两类方法的预测技术已逐渐被应用在生物医学研究领域中,但对其预测性能(即泛化能力的大小)进行比较的研究却很少,因此将数据挖掘方法与统计学方法的泛化能力进行比较是一个非常值得研究的方向。本研究以2型糖尿病并发末梢神经病变(Diabetic Peripheral Neuropathy, DPN)的病例对照研究数据(数据来源情况详见本文第2章)为例,采用Logistic回归(Logistic Regression, LR)、决策树(Decision Trees, DT)和神经网络(Neural Networks, NN)对DPN发生的概率进行预测,并就建模和预测性能比较研究中的几个难点,提出了较为理想的解决方案。本研究的难点及相应的解决方案如下:
     (1)科学地实现连续变量离散化。在一些科学研究中,人们通常对一些连续变量的一个单位值的变化不感兴趣,或根据专业知识需将连续变量进行离散化,因此如何科学的实现连续变量离散化是一个值得研究的问题。
     本文采用χ2分割法对连续变量进行离散化,不仅使离散化后的变量各个等级之间划分得有意义,而且使等级之间的区分度尽可能地大,很好地实现了连续变量离散化的目的。
     (2)在建模过程中充分利用数据信息、防止过拟合。在数据量有限的情况下,能尽量多地利用数据信息是很重要的。在决策树和神经网络构建过程中,如何在小样本时既能达到充分利用数据信息,又能防止过拟合现象的发生是一个重要的问题。
     本研究采用100次5~7折分层交叉验证方法,将分类和回归树(CART)与卡方自动交互式检测树(CHAID)相结合,建立起决策树模型,既充分利用了数据信息,又避免了过拟合现象的发生。此外,在选取神经网络模型隐含层数和隐含层节点数目时,以SBC准则作为选择的标准,在建模过程中利用L-M优化技术,采用权重衰减和预训练的方法,也可充分利用数据信息,有效避免过拟合现象和局部最优现象的发生,从而建立起较为准确可靠的模型。
     (3)快速有效地构建Logistic回归模型。常规的Logistic回归建模筛选变量的方法有向前选择法、向后剔除法、逐步法、最优子集法,前三种筛选变量方法均涉及到变量进入和(或)剔除的P值大小的选择问题,显然P值的选取存在一定的主观性。例如,有些研究认为变量进入方程的显著性水平(SLE)0.05过于严厉,经常将重要的变量排除在外。针对所有原因变量的组合情况,最优子集法均可以给出其对应的χ2值,但却无法指出哪种组合最佳。因此,如何快速有效地进行变量筛选,构建准确可靠的模型是很重要的。
     本研究中采用最优子集法与AIC信息准则相结合对变量进行快速方便的筛选。此法既考虑了模型的泛化能力又避免了人为选取P界值点带来的“烦恼”,建立的模型也优于用常规筛选变量方法建立的模型。
     (4)小样本情况下的模型泛化能力比较。大量文献资料显示,迄今为止,在生物医学领域中,关于多种不同模型预测、分类技术的比较研究,或针对于较大的数据量(如从几百例观测至几十万例观测),或对模型泛化能力比较时采用保持法(将数据集随机分成两部分,一部分建模一部分测试),并没有涉及到小样本时如何有效利用数据信息以及基于小样本时如何对模型泛化能力进行比较。而在实际工作中,很多数据集较小(如100例左右),且变量较多,此时采用保持法进行模型泛化能力的比较就会损失数据信息,导致比较结果的可靠性降低甚至不可靠(本研究中也证实了这一点,详见本文第5章)。因此,如何针对小样本的特性,有效地构建模型并对模型的泛化能力进行客观评价,是一个很值得研究的问题,也是本次研究的重点。
     在本研究中针对小样本的特性,采用Monte Carlo模拟抽样(10~100次的2~10折分层交叉验证法、刀切法、100~1000次自引导法(具体为0.632自引导法))的校正技术,对模型的泛化误差作出可靠的评价,进而对三种预测方法(LR、DT、NN)的泛化能力进行比较,并客观地评价三种模型的泛化能力,有效弥补了应用保持法对模型泛化能力进行比较时存在的上述不足。就本资料而言,结果表明,总体来说NN泛化能力最好,LR次之,DT最差。
     (5)调整过抽样。当样本的获取方式是来源于过抽样(即分离抽样)时,模型估计的概率值是基于样本而不是基于总体的,此时对总体人群疾病发生的概率进行预测可能会存在较大的偏差。
     本文针对过抽样的特点,利用先验概率对后验概率进行调整,从而使调整后的结果能够更客观准确地预测疾病发生的可能性。
     综上所述,本研究采用三种方法(LR、DT、NN)对DPN发生的概率进行预测,在基于小样本条件下,从五个方面(即①科学地实现连续变量离散化、②充分利用数据且又防止过拟合、③快速有效地构建模型、④有效利用数据信息提高模型泛化能力、⑤有效调整过抽样获得更客观准确的预测结果)进行比较研究和改进,均取得了比较理想的结果,其建模思想和技术方法可方便成功地移植到生物医学甚至其它研究领域中去。
The development of mathematical methods and computer technology has made it possible to use complex models for prediction in recent years. There are two primary methods employed–statistical methods and data mining methods. The predicting technology based on the two methods has been applied to the field of biomedical studies, but there are quite a few studies on comparing performance of prediction, that is, the generalization ability. It is worth comparing the generalization ability of data mining methods with that of statistical methods. In this study, by taking the case control study data of Diabetic Peripheral Neuropathy(DPN) in type 2 Diabetes Mellitus(It is introduced in chapter 2 of this thesis.) for example, some desirable solutions are proposed to several difficulties in building models and in comparing the performance of the logistic regression, decision tree, and neural network for predicting the probability of DPN.
     The difficulties and the corresponding solutions in predicting the probability of DPN are as follows:
     (1) Discretization of continuous variables. In some scientific studies, there is no interest in change of specific value of a unit of continuous variables, or discretization of continuous variables is required according to professional knowledge. However, how to scientifically discretize the continuous variables is a problem worthy of study.
     In this thesis, we use the chi-square partitioning method to discretize the continuous variables. The distinction between classes is maximized after the variables are discretized.
     (2) Utilization of data information and avoiding overfitting in the process of establishing a model. When the amount of data is limited, it is particularly important to use as much data information as possible and avoiding overfitting. This is very important for the decision tree and neural network.
     In this research, we combine the classification and regression tree with the chi-squared automatic interaction detector tree by means of the 100 times 5~7 fold stratified cross-validation method to establish the decision tree model to make full use of data information and also to avoid overfitting. In addition, we use Schwarz Bayes Criterion to choose the number of hidden layers and hidden layer units, and use Levenberg-Marquardt optimization algorithm, weight decay, and preliminary training method to train the neural network. We also establish a reliable neural network model to make full use of the data information and also to avoid overfitting and inferior local minima.
     (3) Quick and efficient establishment of a logistic regression. Conventional screening variable methods of the logistic regression include the forward-entry method, backward-elimination method, stepwise and best subset method. The first three methods are related to how to choose the P value, which is the cutoff point where the variables enter and (or) are removed from the model. It is obvious that P value is subjectively chosen and it is thought that 0.05 slentry is too stringent to include some important variables from the model in some cases. For all the combination of variables, The best subset method can give the corresponding chi-square value, but fails to decide which kind of combination is optimal. Therefore, it is very important to select variables quickly and effectively to establish an accurate and reliable model.
     In this thesis, we combine the best subset method with the Akaike Information Criterion to screen variables quickly and easily. The method not only takes into account the generalization ability of the model but also saves the“trouble”of artificially choosing of P value. Thus we have built a logistic regression model which is superior to conventional screening methods.
     (4) Comparison of the generalization ability in case of small sample. Large amounts of reference show that multiple different studies on model prediction and classification technique in the biomedical field have been either applied to a larger data volume(from several hundred to several hundred thousand observations) or use the holdout method(one part of set for training and the remainder for testing) to asses the generalization ability of the model. But these studies are not related to how to make full use of the data information or how to compare the generalization ability when the sample is small (as one hundred observations or so). But the data set is likely be small and the number of variables is large at work. Data information will be lost and low confidence or even unreliable results will ensue(It is proved in chapter 5 of this thesis.) if the holdout method is adopted to evaluate the generalization ability of the model. Therefore it is worth studying how to establish an effective model and make an objective evaluation of the generalization ability when the sample is small, which is the most one of the primary focuses of the study.
     We adopt Monte Carlo simulation of sampling (2~10 fold stratified cross-validation, jacknife method, 100~1000 times bootstrap method (to be more specific, 0.632bootstrap)) validation technology in the research to make a reliable assessment of generalization errors and make a comparison of the generalization ability in order to achieve an objective evaluation of the three models and avoid the drawback of the holdout method. On the whole, the result shows that the generalization ability of NN is the strongest, followed by LR’s, DT’s in terms of the data of DPN.
     (5) Adjustment of oversampling. When the data is obtained by oversampling (that is, separate sampling), the probability of model estimation is not based on the population but on the sample. There may be larger deviation if we predict the probability of disease of the overall population.
     Because the source of the data is from oversampling, we use prior probability to adjust the posterior probability so that the adjusted results can predict the possibility of outbreak of diseases more objectively and accurately.
     In a word, we use three methods (LR, DT, NN) to predict the probability of DPN. In case of small sample, we have made comparative studies and improvement in terms of (①Discretization of continuous variables scientifically,②Utilization of data information sufficiently and avoiding overfitting,③Quick and efficient establishment of a model,④Utilization of data information efficiently and improvement the generalization ability,⑤Efficient adjustment of oversampling to acquire more objective and accurate results ), and have achieved the desired results. The concept of modeling and techniques could be conveniently and successfully applied to the field of biomedical studies and other fields.

引文

[1] WHO. Developing world faces Mounting Diabetes Risk. 2003.
    [2] Wild S, Roglic G, Green A, et al. Global prevalence of diabetes: estimates for the year 2000 and projections for 2030. Diabetes Care, 2004, 27(5):1047-53.
    [3] Eyigor C, Uyar M, Pirildar S,et al. Combination therapy in treatment of peripheral diabetic neuropathy with severe pain in an adolescent patient. Paediatr Anaesth, 2009, 19(2):193-4.
    [4] Statsenko M E, Poletaeva L V, Turkina S V, et al. Mildronate effects on oxidant stress in type 2 diabetic patients with diabetic peripheral (sensomotor) neuropathy. Ter Arkh, 2008, 80(10):27-30.
    [5]高琳,王小英,杨孟雪等. 2型糖尿病末梢神经病变影响因素探讨.贵州医学杂志, 2003, 27(8):692-693.
    [6]职心乐,王建华.关于2型糖尿病并发末梢神经病变危险因素的病例对照研究.实用预防医学, 2006, 13(1):27-29.
    [7]熊朝松.统计学和数据挖掘的异同.科技广场, 2005:124-126.
    [8]刘红岩,陈剑,陈国青.数据挖掘中的数据分类算法综述.清华大学学报, 2002, 42(6):727-730.
    [9]郭景峰,米浦波,刘国华.决策树算法的并行性研究.计算机工程, 2002, 28(8):77-78.
    [10]周利锋,高尔生,金丕焕. BP神经网络与logistic回归对比初探.中国卫生统计, 1998, 15(1):1-4.
    [11]El-Solh A A., Hsiao C B, Goodnough S. Predicting active pulmonary tuberculosis using an artificial neural network. Chest, 1999, 116(4):968-973.
    [12]Joannis G V, Dimitris A K, Manolis J H. Improved Medical Decision Making.Cambridge. 2000, 201:95.
    [13]Samanta B, Bird GL, Kuijpers M,et al. Prediction of periventricular leukomalacia. Part I. Selection of hemodynamic features using logistic regression and decision tree algorithms. Arti Intell Med. 2009, 45(1):300-320.
    [14]Delen D, Walker G, Kadam A. Predicting breast cancer survivability:a comparison of three data mining methods. Artificial Intelligence in Medicine, 2005, 34(2):113-127.
    [15]Ture M, Kurt I, Kurum A T, et al. Comparing classification techniques for predicting essential hypertension. Expert Systems with Applications, 2005, 29:583-588.
    [16]陈建新,西广成,王伟.数据挖掘分类算法在冠心病临床应用的比较.北京生物医学工程, 2008, 27(3):249-252.
    [17]Breiman L, Friedman J H, Olshen R A, et al. Classification and Regression Trees,British:Chapman and Hall. 1984.
    [18]Ripley B D. Pattern Recognition and Neural Networks. Cambridge:Cambridge University Press, 1996.
    [19]Loh W Y, Shih Y S. Split Selection Methods for Classification Trees. Statistica Sinica, 1997, 7:815-840.
    [20]Kohavi R. A study of cross-validation and bootstrap for accuracy estimation and model selection. The Fourteenth International Joint Conference on Artificial Intelligence (IJCAI). San Fran-cisco, CA: Morgan Kaufman. 1995:1137-1145.
    [21]Hosmer D W, Lemeshow S. Applied Logistic Regression, New York:John Wiley & Sons,Inc, 1989.
    [22]SAS Institute Inc. Logistic Regression Examples Using the SAS System, Version 6, Cary, NC:SAS Institute Inc, 1995.
    [23]Akaike H. Information measures and model selection. Bulletin of the International Statistical Institute, 1983, 50:227-290.
    [24]Breiman L. The little bootstrap and other methods for dimensionality selection in regression:X-fixed prediction error. Journal of the American Statistical Association, 1992, 87(419):738-754.
    [25]Shtatland E S, Kleinman K, Cain E M. Stepwise methods in using SAS proc logistic and SAS enterprise miner for prediction. http://www2.sas.com/proceedings/sugi28/258-28.pdf.
    [26]Shtatland E S, Cain E, Barton M B. The perils of stepwise logistic regression and how to escape them using information criteria and the output delivery system. http://www2.sas.com/proceedings/sugi26/p222-26.pdf.
    [27]朱杨勇,左子叶,张忠平.数据挖掘实践.北京:机械工业出版社. 2003.
    [28]王济川,郭志刚译. Logistic回归模型-方法与应用.北京:高等教育出版社. 2001.
    [29]Tan P, Steibach M, Kumar V. Introduction to Data Mining. POST&TELECOM PRESS. 2008:135-137.
    [30]Santner D E, Duffy T J. The Statistical Analysis of Discrete Data, New York: Springer-Verlag, 1989.
    [31]Furnival G M, Wilson R W. Regression by Leaps and Bounds. Technometrics, 1974, 16(4): 499–511.
    [32]Akaike H. A new look at the statistical identification model. IEEE Trans. Auto Control, 1974, 19(6):716-723.
    [33]秦文.分类技术中的决策树算法分析.深圳信息职业技术学院学报, 2004, 2(1):54-58.
    [34]张云涛,龚玲.数据挖掘:原理和技术.北京:电子工业出版社, 2004.
    [35]Michael J.A. Berry, Gordon S.Linoff. Data mining technique for marketing, sales and customer support. New York: Wiley, 1997.
    [36]Ture M, Kurt I, Kurum A T, Ozdamar K, et al. Comparing classification techniques for predicting essential hypertension. Expert Systems with Applications, 2005, 29(3):583–588.
    [37]Quinlan J R. C4.5:Programs for Machine Learning. San Mateo, Ca:Morgan Kaufmann, 1993.
    [38]Breiman L. Technical Note: Some Properties of Splitting Criteria. Machine Learning, 1996, 24:41-47.
    [39]Kass G V. An Exploratory Technique for Investigating Large Quantities of Categorical Data. Applied Statistics, 1980, 29:119-127.
    [40]Morgan J N, Sonquist J A. Problems in the Analysis of Survey Data, and a Proposal. Journal of the American Statistical Association, 1963, 58:415-434.
    [41]Loh W, Vanichsetakul N. Tree-Structured Classification Via Generalized Discriminant Analysis (with discussion). Journal of the American Statistical Association, 1988, 83:715-728.
    [42]?nan Güler, Hüseyin Polat, U?man Ergün. Combining Neural Network and Genetic Algorithm for prediction of Lung Sounds. Journal of Medical System, 2005, 29(3):217-223.
    [43]Dieterle F, Müller-Hagedorn S, Liebich H M, et al. Urinary nucleosides as potential tumor markers evaluated by learning vector quantization. Artificial Intelligence in Medicine, 2003, 28(3):265-279.
    [44]Aleksander M, Renata J, Neural network as a decision support system in the development of pharmaceutical formulation–focus on solid dispersions. Expert System with Applications, 2005, 28(2):285-294.
    [45]Haykin S. Neural network:A comprehensive foundation. Upper Saddle River, NJ:Prentice Hall, 1999.
    [46]Principe J, Euliano N R, Lefebvre W C. Neural and adaptive systems: Fundamentals through simulations. New York:Wiley, 1999.
    [47]Pendharkar P C, Rodger J A, Yaverbaum G J,et al. Association,statistical,mathematical and neural approaches for mining breast cancer patters. Expert Systems Applications, 1999, 17:223-232.
    [48]Furundzic D, Djordjevic M, Bekic A J. Neural networks approach to early breast cancer detection. Syst Architect, 1998, 44:617-633.
    [49]Bartlett P L. For Valid Generalization, the Size of the Weights is More Important than the Size of the Network in: Advances in Neural Information Processing Systems Volume 9 (Mozer, Jordan, and Petsche eds.). Cambridge, Ma: The MIT Press, 1997.
    [50]Bishop C M. Neural Networks for Pattern Recognition. New York: Oxford University Press,1995.
    [51]Hoerl A E, Kennard R W. Ridge Regression: Biased Estimation for Nonorthogonal Problems. Technometrics, 1970, 12:55-67.
    [52]Sarle, W. S. Stopped Training and Other Remedies for Overfitting. Proceedings of the 27th Symposium on the Interface, 1995.
    [53]Armitage P, Colton T. Encyclopedia of biostatistics. New York:John, 1998:3738-3744.
    [54]乔友林,侯俊等.我国太行山高发区食管癌流行趋势及防治策略.中国医学科学院报,2001, 23(1):10-14.
    [55]黄俊,周申范,唐婉莹. TNT生化降解时间序列人工神经网络预报模型.环境科学研究,2000, 13(2):3-5.
    [56]王敬灏. ROC曲线在临床医学诊断实验中的应用.中华高血压杂志, 2008, 16(2):175-177.
    [57]Hanley J A, McNeil B J. A method of comparing the areas under receiver operating characteristic curves derived from the same cases.radiology. 1983, 148:839-843.
    [58]Delong E R., Delong D M, Daniel L, et al. Comparing the Areas Under Two or More Correlated Receiver Operating Characteristic Curves:A Nonparametric Approach. Biometrics, 1988, 44(3):837-845.
    [59]陈卫中,潘晓平,宋兴勃,等. ROC曲线中最佳工作点的选择.中国卫生统计, 2006, 23:157-158.
    [60]Ian H. Witten, Eibe Frank. data mining:Practical Machine Learning Tools and Techniques(second ediction). China Machine Press, 2005.
    [61]陈希孺,王松桂.近代回归分析.合肥:安徽教育出版社, 1987.
    [62]Efron B, Tibshirani R. Cross-validation and the bootstrap: estimating the error rate of a prediction rule. Technical report 176, Department of Statistics, Stanford University, 1995.
    [63]Hoeffding W. A Non-parametric Test of Independence. Annals of Mathematical Statistics, 1948, 19:546–557.
    [64]Hollander M. Wolfe D. Nonparametric Statistical Methods. Second Edition, New York: John Wiley & Sons, 1999.
    [65]Wu Y Z, Giger M L, Doi K, et al. Artificial neural networks in mammography:application to decision making in the diagnosis of breast cancer. Radiology, 1993, 187:81-87.
    [66]Mei-Sheng Duh, Alexander M. walker, John Z. Ayanian. Epidemiologic interpretation of artificial neural networks. Am J Epidemiol, 1998, 147(12):1112-1122.
    [67]Setiono R. Extracting rules from pruned neural networks for breast cancer diagnosis. Artif Intell Med, 2000, 18:205-219.
    [68]Efron B. The Jacknife, the Bootstrap, and other Resampling Plans. SIAM, Philadelphia,1982:82.
    [69]Stone M. Cross-validatory choice and assessment of statistical predictions (with discussion), Journal of the Royal Statistical Society, Series B, 1974, 36, 111-147.
    [70]Stone M. An asymptotic equivalence of choice of model by cross-validation and Akaike’s criterion. Journal of the Royal Statistical Society, Series B, 1977, 39: 44-47.
    [71]Shtatland E S. A new strategy of model building in proc logistic with automatic variable selection, validation, shrinkage and model averaging .http://www2.sas.com/proceedings/sugi29/191-29.pdf.
    [72]SAS/ENTERPRISE MINER User’s Guide, Version 8. Cary, NC: SAS Institute Inc., 1999.
    [73]陈峰.医用多元统计分析方法.北京:中国统计出版社, 2001:83-111.
    [74]Comak E, Arslan A, Türko?lu ?. A decision support vector machines for diagnosis of the heart valve diseases. Computers in Biology and Medicine, 2006, 37(1):1-7.
    [75]胡良平.现代统计学与SAS应用.北京:军事医学科学出版社, 2002:257-259.
    [76]Mantel N. Why Stepdown Procedures in Variable Selection. Technometrics, 1970, 12:621-625.
    [77]Harrell F E. Predicting Outcomes: Applied Survival Analysis and Logistic Regression. Charlottesville Virginia: School of Medicine, University of Virginia, 1997.
    [78]Lawless J F, Singhal K. Efficient Screening of Nonnormal Regression Models. Biometrics, 1978, 34:318-327.
    [79]史忠植.知识发现.北京:清华大学出版社, 2002:2-6.
    [1]熊朝松.统计学和数据挖掘的异同.科技广场, 2005:124-126.
    [2] Ian H. Witten, Eibe Frank. data mining:Practical Machine Learning Tools and Techniques(second ediction). China Machine Press, 2005.
    [3]王济川,郭志刚译. Logistic回归模型-方法与应用.高等教育出版社. 2001.
    [4]秦文.分类技术中的决策树算法分析.深圳信息职业技术学院学报, 2004, 2(1):54-58.
    [5] Michael J.A. Berry, Gordon S.Linoff. Data mining technique for marketing, sales and customer support. New York: Wiley, 1997.
    [6] Ture M, Kurt I, Kurum A T, Ozdamar K, et al. Comparing classification techniques for predicting essential hypertension. Expert Systems with Applications, 2005, 29(3):583–588.
    [7] Breiman L, Friedman J H, Olshen R A, et al. Classification and Regression Trees, British:Chapman and Hall. 1984.
    [8] Ripley B D. Pattern Recognition and Neural Networks. Cambridge:Cambridge University Press, 1996.
    [9] Quinlan J R. C4.5:Programs for Machine Learning. San Mateo, Ca:Morgan Kaufmann,1993.
    [10]Breiman L. Technical Note: Some Properties of Splitting Criteria. Machine Learning, 1996, 24:41-47.
    [11]Morgan J N, Sonquist J A. Problems in the Analysis of Survey Data, and a Proposal. Journal of the American Statistical Association, 1963, 58:415-434.
    [12]Loh W, Vanichsetakul N. Tree-Structured Classification Via Generalized Discriminant Analysis (with discussion). Journal of the American Statistical Association, 1988, 83:715-728.
    [13]?nan Güler, Hüseyin Polat, U?man Ergün. Combining Neural Network and Genetic Algorithm for prediction of Lung Sounds. Journal of Medical System, 2005, 29(3):217-223.
    [14]Dieterle F, Müller-Hagedorn S, Liebich H M, et al. Urinary nucleosides as potential tumor markers evaluated by learning vector quantization. Artificial Intelligence in Medicine, 2003, 28(3):265-279.
    [15]Aleksander M, Renata J, Neural network as a decision support system in the development of pharmaceutical formulation–focus on solid dispersions. Expert System with Applications, 2005, 28(2):285-294.
    [16]Haykin S. Neural network:A comprehensive foundation. Upper Saddle River, NJ:Prentice Hall, 1999.
    [17]Principe J, Euliano N R, Lefebvre W C. Neural and adaptive systems: Fundamentals through simulations. New York:Wiley, 1999.
    [18]张云涛,龚玲.数据挖掘:原理和技术.北京:电子工业出版社, 2004.
    [19]Pendharkar P C, Rodger J A, Yaverbaum G J,et al. Association,statistical,mathematical and neural approaches for mining breast cancer patters. Expert Systems Applications, 1999, 17:223-232.
    [20]Furundzic D, Djordjevic M, Bekic A J. Neural networks approach to early breast cancer detection. Syst Architect, 1998, 44:617-633.
    [21]Bartlett P L. For Valid Generalization, the Size of the Weights is More Important than the Size of the Network in: Advances in Neural Information Processing Systems Volume 9 (Mozer, Jordan, and Petsche eds.). Cambridge, Ma: The MIT Press, 1997.
    [22]Bishop C M. Neural Networks for Pattern Recognition. New York: Oxford University Press, 1995.
    [23]周利锋,高尔生,金丕焕. BP神经网络与logistic回归对比初探.中国卫生统计, 1998, 15(1):1-4.
    [24]El-Solh A A., Hsiao C B, Goodnough S. Predicting active pulmonary tuberculosis using an artificial neural network. Chest, 1999, 116(4):968-973.
    [25]Joannis G V, Dimitris A K, Manolis J H. Improved Medical Decision Making.Cambridge.2000, 201:95.
    [26]Samanta B, Bird GL, Kuijpers M,et al. Prediction of periventricular leukomalacia. Part I. Selection of hemodynamic features using logistic regression and decision tree algorithms. Arti Intell Med. 2009, 45(1):300-320.
    [27]Delen D, Walker G, Kadam A. Predicting breast cancer survivability:a comparison of three data mining methods. Artificial Intelligence in Medicine , 2005, 34(2):113-127.
    [28]Ture M, Kurt I, Kurum A T, et al. Comparing classification techniques for predicting essential hypertension. Expert Systems with Applications, 2005, 29:583-5883.
    [29]陈建新,西广成,王伟.数据挖掘分类算法在冠心病临床应用的比较.北京生物医学工程, 2008, 27(3):249-252.
    [30]史忠植.知识发现.北京:清华大学出版社, 2002: 2-6.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700