医院信息数据挖掘及实现技术的探索
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
探索基于SPSS Clementine的在线医院数据挖掘技术的实现,达到节约资源、共享资源的目的。在此基础上,探讨数据挖掘技术在因素预测、疾病判别诊断、疾病关联分析中的应用,结合实例研究重庆市结核病流行过程及发展趋势、乳腺癌腋窝高位淋巴结转移的危险因素及判别分类模型和糖尿病与并发症的关联知识发掘。为临床管理人员、医务人员、科研工作者进行科学管理、提高诊疗水平以及开展医学研究提供辅助决策与综合分析的工具。当前信息领域内普遍存在的“知识发现”问题迫切需要研究和解决,就方法学而言,科学地选择适当的数据挖掘算法是获得准确知识规则的关键;而在线医院数据挖掘技术的实现对提高医院管理水平和医疗质量具有重要的应用价值。
     随着计算机技术、生物医学工程研究的飞速发展,计算机信息技术在医学领域广泛应用,使得大量医学信息被精确记录下来,积累了大量的数据资源,激增的数据背后隐藏着许多重要的有用信息。从这些大量的数据资源中挖掘深层次的、隐含的、有价值的知识显得越来越重要。到目前为止,在国内,数据挖掘技术在医疗服务领域的研究有所报道,但未见其在线分析系统的研究应用;针对不同目标的实际应用,科学地选择适当的数据挖掘算法的方法学研究尚属先例。
     本研究采用Java网络编程语言,实现基于SPSS Clementine的在线医院数据挖掘的技术。利用来源于重庆市三所医疗机构(重庆市结核病防治所、重庆医科大学附属第一医院、附属第二医院)的医院数据,包括结核病、乳腺癌和糖尿病的资料。分别采用ARIMA模型、BP神经网络模型、GM(1,1)模型对结核病发病率进行预测分析比较;采用Logistic模型、CHAID模型、RBFN模型、RBFN-Logistic混合模型、RBFN-CHAID混合模型对乳腺癌腋窝高位淋巴结转移判别分类比较;采用Apriori关联分析模型对糖尿病与并发症的关联强度进行描述。
     主要研究内容:①采用Java网络编程语言,对在线数据挖掘技术的实现进行探索。②分析重庆市结核病流行过程,乳腺癌腋窝高位淋巴结转移的危险因素以及糖尿病与并发症的关联。③采用ARIMA模型、BP神经网络模型、GM(1,1)模型对结核病发病率进行预测分析。④采用Logistic模型、CHAID模型、RBFN模型、RBFN-Logistic混合模型、RBFN-CHAID混合模型对乳腺癌腋窝高位淋巴结转移判别分类。⑤利用准确率(Accuracy)和可靠性(Reliability)指标评价模型的准确性和可靠性。
     研究结果表明:①初步整合了SPSS Clementine,实现了在线医院数据采集、执行引擎、分析结果处理和分析结果查询的流程处理。②结核病有明显的季节流行高峰,基本是每年一、三季度发病人数较少,二、四季度发病人数较多。一个结核病流行年各季度发病率与一年前的一个半结核病流行年各季度发病率有关系。对结核病发病率的预测必须考虑季节因素、周期性及随机因素的影响,才能做出准确的预测。③ARIMA模型、BPANN2模型和GM(1,1)模型比较,前两者对结核病发病率的预测的相对误差分别为0.05872和0.06999,GM(1,1)模型为0.01210,说明残差GM(1,1)模型对结核病具有较好的预测性能。④乳腺癌腋窝高位淋巴结转移与腋窝中低淋巴结状况、肿瘤大小有明显关系。⑤RBFN模型采用权值矩阵表达诊断知识,Logistic模型与RBFN-Logistic混合模型采用Logistic回归系数表达诊断知识,二者均不易被使用者解读;CHAID模型和RBFN-CHAID混合模型采用了自然语言以树型的方式表达,提高了结果的可理解性。⑥Logistic模型、CHAID模型、RBFN模型、RBFN-Logistic混合模型、RBFN-CHAID混合模型的平均预测准确率分别为83.34%、83.79%、85.61%、83.77%、79.74%,r ?1分别为0.0720、0.0625、0.0549、0.0766、0.0948。RBFN模型所获知识的可靠程度以及对测试集合测试的准确率明显优于其它算法。⑦CHAID模型提取的诊断规则描述简单易懂,应用方便,可判断各诊断指标对乳腺癌腋窝高位淋巴结转移诊断贡献的大小,从CHAID决策树型可见,中低淋巴结状况对乳腺癌腋窝高位淋巴结转移诊断起决定性作用,肿瘤大小则可作为诊断的重要指标。因此,CHAID模型是一种简便可行的计算机辅助诊断方法,可从病例自动提取诊断规则,具有较广泛的实用价值,可应用于其它疾病的诊断研究。⑧泌尿道感染、肾病、眼部病变、神经病变、高脂血症、高血压、心脏病、冠心病等与糖尿病具有明显并发倾向。
     结论:①在线医院数据挖掘技术是未来医院信息系统的重要组成部分,对提高医院管理水平和医疗质量,降低医院运营成本具有重要的应用价值。②明确了GM(1,1)模型是预测结核病发病率的最佳预测算法;乳腺癌腋窝高位淋巴结转移判别分类的最佳算法是RBFN模型,对判别分类准确率和可靠性排位紧随其后的CHAID模型也是极佳的选择,这是从使用者易理解性、判别分类准确率和可靠性角度综合之结果;Apriori关联分析模型作为医生的辅助工具,提示临床医生关注、研究泌尿道感染与糖尿病两者之间的真正关系。
Objective It is worth establishing practical, simple-operated data mining software of hospital information based on SPSS Clementine via internet, with the integrated hospital information system, And then discussing the application of data mining on variable forecast, disease diagnosis and association rule of disease, and studying in the methodology of data mining that analyzing the prevalence status of tuberculosis and its trend in the future, the risk factors of the axillary III lymph nodes metastasis of breast cancer and its classification model, the association rule of diabetes and diabetic complication, using the optimum arithmetic of data mining. The online data mining of hospital information system not only can save money and share resources, but also can provide efficient tool of comprehensive analysis and making decision for clinical manager, doctor, nurse and other technician to administer scientifically, enhance the accuracy of diagnosis the effect of treatment, and make medical research. As the methodology of data mining stands, it's the key-step of the exact obtained-knowledge that taking the optimum arithmetic of data mining scientifically.
     With the development of computer technology and biomedical engineering research, and the widely application of computer information technology in medicine field, a great lot of exact medical records were stored which contain a lot of important knowledge. It becomes more and more importance that mining the hidden, deep-seated, valuable knowledge from the lots of medical records, because it’s impendent solution on the‘Knowledge Discover’in the medical information field which can improve the manage level of hospital and advance the medical service quality. Up till the present moment, there have been some publications on the application of data mining in the medical service via internet in America, no in China, according to different practical data mining for different object, taking the optimum arithmetic of data mining scientifically has not been done in the study which existed.
     Method and Data Using Java network programming language and implementing of online data mining of hospital information system based on SPSS Clementine. Using Autoregressive Integrated Moving Average model (ARIMA), Back-Propagation Artificial Neural Network model (BPANN), Grey model (1, 1) (GM (1, 1)) to forecast the prevalence of tuberculosis and compare the accuracy of the three arithmetic, based on the data from the Anti-tuberculosis Institute of Chongqing. Using Logistic model (Logistic), CHAID model (CHAID), Radial Basis Functions Network model (RBFN), the combination model of the RBFN and the Logistic, the combination model of the RBFN and the CHAID to classify the status of axillary III lymph nodes of breast cancer and compare the accuracy and reliability of the five arithmetic, based on the data from the First Affiliated Hospital of Chongqing University of Medical Sciences. Using Apriori model to describe the association rule between diabetes and diabetic complication, based on the data from the Second Affiliated Hospital of Chongqing University of Medical Sciences.
     Studied①Using Java network programming language and explorating the implementation of online data mining of hospital information system based on SPSS Clementine.②analyzing the prevalence status of tuberculosis in Chongqing, the risk factors of the axillary III lymph nodes metastasis of breast cancer and the association rule between diabetes and diabetic complication.③Utilizing three arithmetic of data mining: ARIMA, BPANN, GM (1, 1) to predict the prevalence of tuberculosis and compare the accuracy of them.④Making a combination model through combining the RBFN and the Logistic, and combining the RBFN and the CHAID.⑤Utilizing the Logistic, CHAID, RBFN, the combination model of the BFN and the Logistic, and the combination model of the RBFN and the CHAID to classify the status of axillary’s III lymph nodes of breast cancer and to compare the accuracy and reliability with five arithmetic.
     Results①preliminary Setted up the data mining software of hospital information system via internet based on SPSS Clementine,implemented the data collecting, engine executing, result storing, and searching the result.②The prevalence of tuberculosis clearly show a seasonal moving regular, which manifests a wave phenomenon the whole year, in the first and third season the incidence goes down, while it increases in the other two seasons basically. There are correlation between a season of this year and six seasons of the year before last year. The predictive results will be right when consider the seasonal factor and circle random factors of tuberculosis.③The average relative error of predictive model of ARIMA, BPANN2, and GM (1, 1) are 0.05872, 0.06999, and 0.01210, respectively, means the GM (1, 1) is perfect for predicting the prevalence of tuberculosis.④There are significant correlation between the status of axillary III lymph nodes of the breast cancer and the status of axillary I and II lymph nodes, and the size of tumor.⑤Some expression of diagnostic knowledge are difficult to understand for user, for example, the expression of diagnostic knowledge of the RBFN is weight matrix, and the Logistic and the combination of RBFN and Logistic are logistic regression coefficient. But the expression of diagnostic knowledge of the CHAID and the combination of RBFN and CHAID are the tree plot using natural language which easy to understand.⑥The average predictive accuracy of the Logistic, the CHAID, the RBFN, the combination of RBFN and Logistic, and the combination of RBFN and CHAID are 83.34%, 83.79%, 85.61%, 83.77%, and 79.74%, respectively. And the absolute values of the reliabilities minus 1 of them are 0.0720, 0.0625, 0.0549, 0.0766, and 0.0948, respectively. The accuracy and reliability of the RBFN is higher than other arithmetic in the five methods, means that the RBFN is the best arithmetic for classifying the status of axillary III lymph nodes of breast cancer.⑦The influence order of the diagnostic indexes can be found from the diagnostic knowledge of the CHAID, which is described by a chart of tree, the status of axillary I and II lymph nodes, and the size of tumor are very important for classifying the status of axillary III lymph nodes of breast cancer. The CHAID is a simple, practical diagnostic method based on the computer which can automatically pick up diagnostic knowledge from records. So it can be widely applied on breast cancer and other diseases research.⑧There are eight diseases such as infected-urinary, diabetic nephropathy, diabetic ophthalmia, diabetic neuropathy, hyperlipemia, hypertension, diabetic cardiopathy, coronary heart disease, which are significant relative to diabetes.
     Conclusions①had preliminary implemented the online data mining of hospital data based on SPSS Clementine, which is very important part of the hospital information system. It will enhance the use of computer information technology, which will improve the manage level of hospital, advance the medical service quality, reduce the hospital operation price, when the hospital information system combined with the data mining.②Making clear and confirming that the GM (1, 1) is perfect for predicting the prevalence of tuberculosis. The RBFN and the CHAID is the best two kind of arithmetic for classifying the status of axillary III lymph node of breast cancer, which is the result of analyzing the expression of diagnosis knowledge and the accuracy and reliability of the five arithmetic methods. As an assistant tool, Apriori can make doctor to research the real correlation between the diabetes and infected-urinary which seldom reported in the medical journal.
引文
[1] 赵宏,邹雯.证券市场预测的神经网络方法[J].系统工程理论与实践.1997, (6):127-131
    [2] 王振龙主编.时间序列分析[M].北京:中国统计出版社,2000.5-7
    [3] 顾岚主译,范金城校译.时间序列分析预测与控制[M].北京:中国统计出版社,1997.101-135
    [4] 徐国强 , 胡清友 . 统计预测和决策 [M]. 上海 : 上海海财经大学出版社,2001.113-131
    [5] 刘国英,李伍升,刘玉振.季节时间序列分析法预测血站库存血量[J].Journal of Medical Forum.2004,25(21):7
    [6] 姜午,王萍,安向东,等.季节周期回归模型在预测甲型肝炎发病率中的应用[J].中国公共卫生管理.2003,19(6):556-557
    [7] 侯世方,孙长福,王毅,等.简易季节时间序列分析法的应用[J].中国卫生统计.(13):176-178
    [8] 赵晓斌.两种季节时间序列资料预测法的应用比较[J].中国卫生统计.2001,18(3):191
    [9] 张爱祥,董桂晨,徐秀华.简易季节时间序列资料分析方法在疾病预测中的应用[J].中国卫生统计.1999,16(4):256
    [10] 许汝福,王文昌,尹全焕,等.简易季节时间序列资料分析方法[J].中国卫生统计.1996,13(3):45-46
    [11] 王灵凤 , 徐艳 . 应用时间序列分解法预测门诊人次 [J]. 泸州医学院报.2004,27(3):276-278
    [12] 高湘伟,刘瑞,姜建辉,等.西京医院门诊量预测的趋势季节模型[J].解放军医院管理杂志,2001,8(1):60
    [13] 刘艳,高凌飞.时间序列模型在预测医院药品收入中的应用[J].数理医药学杂志.1998,11(4):294-295
    [14] 张蔚,王文昌.季节效应分析在医院管理中的应用[J].第三军医大学学报.1998,20(6):553-555
    [15] 刘勋,赵泽贞,翁晓清,等.应用时间序列分析法预测急诊抢救人次的初步研究[J].河北医学院学报.1994,15(2):88-91
    [16] 赛晓勇,张治英,徐德忠,等.不同时间序列分析法在洞庭湖区血吸虫病发病预测中的比较[J].中华流行病学杂志.2004,25(10):863-866
    [17] 肖永富,王谦,魏继炳.四川省 1999~2001 年血吸虫病流行趋势预测[J].实用寄生虫病杂志.2000,8:53-55
    [18] 王谦,肖永富,蒋朝东.防治过程中血吸虫病流行趋势预测方法研究[J].实用寄生虫病杂志.1999,7:120-121
    [19] 丁守銮,康家琦,王洁贞.ARIMA 模型在发病率预测中的应用[J].中国医院统计. 2003,10(1):23-17
    [20] 甘 仞 仞 . 动 态 据 的 统 计 分 析 [M]. 北 京 : 北 京 理 工 大 学 出 版社,1991.184-185,215-216
    [21] 项静恬,杜金观,史久恩,等.动态数据处理一时间序列分析[M].北京:气象出版社,1986.177-181
    [22] 高惠璇,耿直,李贵斌,等.SAS 系统·SAS/ETS 软件使用手册[M].2 版.北京:中国统计出版社,1998
    [23] SAS Institute Ino.SAS/ETS User's Guide [M].Version 5 Edition. NC:SAS Institute Inc.,1984.127
    [24] Diaz J, Garcia R, Velazquez,et a1.Effects of extremely hot days on people older than 65 years in Seville(Spain)from 1986 to 1997[J]. Int J Biometeorol.2002,46:145-149
    [25] McCleary R, Chew KSY, Merrill V, et a1.Does legalized gambling elevate the risk of suicide? An analysis of U.S. counties and metropolitan areas[J].Suicide Life Threat Behav.2002,32:209-210
    [26] Clancy L, Goodman P, Sinclair H, et a1.Effect of air-pollution control on death rates in Dublin, Ireland:an intervention study[J].Lancet.2002,360:1210-1214
    [27] 李瑞兴,秦超,陈国良,等.集团军山地进攻战斗减员的时间序列模型[J].第四军医大学学报.2003,24(11):1024-1026
    [28] 方兆本,李红星,杨建萍.基于公开数据的 SARS 流行规律的建模及预报[J]. 数理统计与管理.2003,22(5): 48-52
    [29] 钟朝晖,刘达伟,张燕.重庆市主城区入口死亡率的时间序列分析[J].中国公共卫生.2003,19(7):96-798
    [30] 秦超,陈国良,李瑞兴等.西南边境作战某军减员率的ARIMA分析[J].运筹与管理,12(3):54-57
    [31] 丁守銮,康家琦,王洁贞.ARIMA 模型在发病率预测中的应用[J].中国医院统计.2003,10(1):23-26
    [32] 张蔚,张彦琦,杨旭,等.时间序列资料 ARIMA 季节乘积模型及其应用[J].第三军医大学学报.24(8):955-957
    [33] 倪宗瓒,巫秀美,姚树祥,等.应用ARIMA模型动态分析高危人群的肺癌发病率[J].数理医药学杂志.2001,14(4):294-296
    [34] 吴进军,国虹,苏汝好.中山市鼻咽癌发病率及死亡率时间序列分析与预测[J].中国医院统计.2001,8(1):16-19
    [35] 吴进军,苏汝好.四会市鼻咽癌发病率及死亡率时间序列分析与预测[J].中国卫生统计.2000,17(6):345-347
    [36] 吴进军,苏汝好. 5 种系统疾病住院医疗费用的时间序列分析与预测[J].中国医院管理.2000,20(11):14-16
    [37] 吴进军,苏汝好.中山市肺癌发病率及死亡率时间序列分析与预测[J].现代预防医学.2000,27(4):443-445
    [38] 吴进军.住院构成前 5 位病种医疗费用时间序列分析与预测[J].中国卫生资源.2000,3(2):63-66
    [39] 吴进军,苏汝好.中山市宫颈癌发病率及死亡率时间序列分析与预测[J].医学信息.2000,13(11):569-571
    [40] 凌莉,方积乾,汤泽群,等.时间序列方法在卫生人力资源需求预测中的应用[J].中国卫生统计.1999,16(5):266-268
    [41] 张晋昕,何士卫,王亚拉,等.ARIMA 模型在医院卫生消耗材料需求量预测中的应用[J].中国卫生统计.1999,6(4):210-212
    [42] 于浩,李君荣,曾兆汉,等.2000 年南京市肺癌死亡趋势的预测[J].数理医药学杂志.1998,11(3):199-200
    [43] 张浩,杨卫东,王泳沛.时序预测方法应用探讨---成都市东城区传染病死亡率预测[J].现代预防医学.1993,20(2):87-89
    [44] 温亮 , 徐德忠 , 林明和 , 等 . 应用时间序列模型预测疟区疟疾发病率[J].2004,25(6):507-510
    [45] 林文尧,钱雪峰,张维芳,等.海门市原发性肝细胞癌死亡率时间趋势分析[J].江苏预防医学.2003,14(1):14-6
    [46] 邓聚龙.灰色系统理论教程[M].武汉:华中理工大学出版杜,1990
    [47] 许汝福,王文昌,易东,等.时间序列资料 GM(O,N)预测模型及其应用[J].中国卫生统计.1999,16(3):162-163
    [48] 许汝福,尹全焕,张亚萍,等.时间序列资料灰色模型综合预测方法及其应用[J]. 中国医院统.1997,4(3):155-157
    [49] 许汝福,王文昌,尹金焕,等.时间序列 GM(1,1)残差季节周期模型及其应用[J]. 数理医药学杂志.1996,9(4):311-321
    [50] 吴胜其,张亚利,张家舫.湖南省肝癌、肺癌、鼻咽癌、宫颈癌死亡率时间序列分析及预测[J].中国慢性病预防与控制.1996,4(5):234-235
    [51] 吴彬,罗仁夏.长乐县胃癌死亡率时间序列分析及预测[J]. 现代预防医学. 1995,22(2):108-109
    [52] 田俊,周天枢.福建省恶性肿瘤死亡率时间序列分析及预测[J].中国公共卫生,1995,11(3):117-119
    [53] 孙昌盛,吴斌,陈敏群.应用灰色系统 GM(1.1)预测福建省及各地市孕产妇死亡率[J].中国妇幼保健.1994,9(6):39-41
    [54] 陈润生.用神经网络法预测膜蛋白 RH 和 BR 的二级结构[J].生物物理学报.1990,(2):267-270
    [55] 方慧生,相秉仁,安登魁.改进 Madaline 学习算法预测蛋白质二级结构[J].中国药科大学学报.1996,27(6):366-369
    [56] 孙之荣,饶晓谦.用人工神经网络方法预测蛋白质超二级结构[J].生物物理学报.1995,11(4):570-574
    [57] 姜成华.主要创伤预后预测方法与仿真模型对比研究[J].第三军医大学学报.1996,8(4):357
    [58] 焦李成.神经网络系统理论[M].西安:电子科技大学出版社,1992
    [59] 程相君,王春宁,陈生潭.神经网络原理及其应用[M].北京:国防工业出版社,1995
    [60] Jacques de Villars, Etinne Barmard. Back-propagation Neural Nets with One and Two Hidden Layers[J].IEEE Transaction on Neural Networks.1993,4(1):163-141
    [61] Ho Chang Lui. Decision Boundary formation from the Back propagation Algorithm[J].IEEE International Conference on Computer Architecture and DSP.89,HK:17-23
    [62] Tong H Lim K S. Threshold autoregressive,1imit cycles and cyclical data[J].Journal of the Royal Statistical Society.1980,42:245-292
    [63] 高辉清,孙卫东.人工神经网络预测和决策问题[J].预测.1995,(4):68-72
    [64] Tang Z, Fishwiek P A. Times series forecasting using neural networks VS. Box-Jemkins methodology[J].Simulation.1991,57(5):303-310
    [65] 丁守銮,王洁贞,崔希友.基于双曲正切函数 HFRS 发病率的 BP 神经网络预测模型[J].系统工程理论与实践.2003,(7):126-136
    [66] 丁守銮,王洁贞,袁晓红.肾综合征出血热发病率的 ANN 预测模型[J].山东大学学报(医学版).2002,40(2):100-102
    [67] 丁守銮,王洁贞,胡平.基于动态学习比率 BP 神经网络的时间序列预测方法[J].中国卫生统计.2000,19(4):194-198
    [68] 屈景辉,廖琪梅,许卫中.医学信息数据库的建立与数据挖掘[J].第四军医大学学报.2001,22(1):88-89
    [69] 张晓东,银琳.数据挖掘技术在医学中的应用探讨[J]. 中国医学理论与实践.2OO5,l5(5):639-641
    [70] Sacha JP, Goodenday LS, Cios KJ. Bayesian learning for cardiac SPECT image interpretation [J]. Artif Intell Med.2002, 26(1-2):109-143
    [71] Ganzert S, Guttmann J, Kersting K, et a1 . Analysis of respiratory pressure-volume curves in intensive care medicine using inductive machinelearning [J]. Artif Intell Med.2002, 26(1-2):69-86
    [72] Imberman SP, Domanski B, Thompson HW.Using dependency/association rules to find indications for computed tomography in a head trauma dataset[J].Artif Intell Med.2002, 26(1-2):55-68
    [73] Evans S, Lemon SJ, Deters CA, et a1. Automated detection of hereditary syndromes using data mining[J].Compute Biomed Res.1997,30(5):337-348
    [74] Mapel DW, Couhas DB. Ethnic differences in the prevalence of nonmalignant respiratory disease among uranium miners[J].Am J Public Health.1997,87(5):833-838
    [75] 龚德平,高颖,唐涛.基于数据挖掘的数字化中医诊断系统[J].中国医学影像技术.2003,19(z1):132-134
    [76] 阎威武,邵惠鹤.支持向量机分类器在医疗诊断中的应用研究[J].计算机仿真.2003(20)2:69-70
    [77] 叶晨洲,杨杰,耿道颖.应用数据挖掘技术从大脑胶质瘤病例中获取诊断知识[J].生物医学工程学杂志.2002,19(3):426-430
    [78] 张辉,钱宗才,屈景辉. 粗糙集在构建骨肿瘤辅助诊断知识库的应用研究[J].医学信息.2004,17(5):257-258
    [79] Ireland RH, Robinson RT, Heller SR, et al. Measurement of high resolution ECG QT interval during controlled euglycaemia and hypoglycaemia[J].Physiol Meas.2000,21(2):295-303
    [80] Hamamoto I, Okada S, Hashimoto T, et a1.Prediction of the Early Prognosis of the Hepatectomized Patient with Hepatocellular Carcinoma with a Neural Network[J]. Biol Med.1995,25(1):49-59
    [81] Lapuerra P, Azen SP, La Bree L.Use of Neura1 Networks in Predicting the Risk of Coronary Artery Disease[J].Computers and Biomedica1 Rearch.1995,28:38-52
    [82] 朱华虹.基于数据挖掘技术的肺癌预后系统的研究[D].广东:华南理工大学,2003
    [83] 张晓东.支持向量机技术在肺癌生存期预测中的应用研究.广东:华南理工大学,2004
    [84] 林晓旻,王治宝.纵向环正则规则长入与后缩的 MMDR 算法[J].计算机工程与应用.2005,41(11):214-217
    [85] Tsumoto S, Tanaka H. Induction of expert system rules based on rough sets and resampling methods[J].Medinfo.1995,8(1):861-865
    [86] Tsumoto S. Automated knowledge acquisition from clinical databases based on rough sets and attribute-oriented generalization [J]. Proc AMIA Symp.1998:548-552
    [87] Tsumoto S. Automated discovery of positive and negative knowledge discovery in clinical databases[J].IEEE Engineering in Medicine Biology.2000,19:56-62
    [88] 余辉,张力新,刘文耀,等.医学数据挖掘系统研究--糖尿病并发症流行病学知识发现[J].计算机工程与应用.2006,42(18):229-232
    [89] 冯波,倪亚芳,孙勤,等.糖尿病患者肌肉和脂肪组织含量与其骨密度关系[J].中华物理医学与康复杂志.2002,24:414-415
    [90] Harris ND, Ireland RH, Marques JLB, et a1.Can changes in QT interval be used to predict the onset of Hypoglycemia in type l diabetes[J].Computers in Cardiology.2000,27:375-378
    [91] 瞿爱珍,庄天戈.计算机辅助医学诊断系统的数据挖掘和知识发现研究[J].国外医学生物医学工程分册.2002,25:97-102
    [92] Hsu JH, Tseng CS, Chen SC, et a1.A methodology for evaluation of boundary detection algorithms on breast ultrasound images[J].Journal of Medical Engineering & Technology,2002,25:173-177
    [93] 邢占峰,吕扬生,关红彦,等.基于小波变幻的 B 超图像噪声消除和边缘提取[J].电子测量与仪器学报.2000(增刊):638-641
    [94] Sacha JP, Cios KJ, Goodenday LS. Issues in automating cardiac SPECT diagnosis[J].IEEE Engineering in Medicine Biology.2000,19:77-88
    [95] Ng EYK, Chen Y, Ung LN. Computerized breast thermography: Study of image segmentation and temperature cyclic variations[J].Journal of MedicalEngineering& Technology.2001,25:12-16
    [96] Pavlopoulos S, Kyriacou E, Koutsouris D, et a1.Fuzzy neural network—based texture analysis of ultrasonic images[J].IEEE Engineering in Medicine Biology.2000,19:39-47
    [97] Kovalerchuk B, Vityaey E, Ruiz JF. Consistent knowledge discovery in medical diagnosis[J].IEEE Engineering in Medicine Biology.2000,19:26-37
    [98] Hall L, Bensaid A, Clarke L, et a1.A Comparison of Neural Network and Fuzzy Clustering Techniques in Segmenting Magnetic Resonance Images of the Brain[J].IEEE Transactions on Neural Networks.1992,3(5):672-682
    [99] 林国庆,曲哲,余奎.数据挖掘技术在医学影像学中的应用[J].医疗设备信息.2004,19(6):33-34
    [100] 张惠宁,卜让吉,崔岩.医院年收治病人数影响因素的灰色关联分析[J].中华医院管理杂志.2003,19:432-433
    [101] 周凤琼,廖振尔.综合指数法用于我院医疗质量评价效果分析[J].中华医院管理杂志.2000,16:109-111
    [102] 于启林,朱士俊,梁爱萍,等.82598 例急诊患者疾病分类调查[J].中华医院管理杂志,2001,17:599-601
    [103] 张刚,吴运堂,孙广熙,等.科室目标管理与计算机数据库的应用[J].中华医院管理杂志,2000,16:495-497
    [104] 裘利君,毛惠芳,陈正英.实施住院部床位统一调配的尝试[J].中华医院管理杂志.2002,18:148-149
    [105] 宋卫亚,田立启.多变量因素分析在医院经济运行管理中的应用[J].中华医院管理杂志.2002,18:620-621
    [106] 沈小庆,盛炳义,王珏,等.决策分析方法在医院药品采购中的应用[J].中国医院管理.2003,23(11):37-38
    [107] 张貂,翟所迪.非线性混合效应模型法在群体药代动力学和群体药效学中的研究进展[J].国外医学药学分册.2004,31:236-240
    [108] 原海燕,李焕德,王来海.决策分析在氯氮平治疗药物监测中的应用[J].中国药学杂志.2002,37:684-686
    [109] 魏健,郦柏平,赵永根,等.抗生素合理应用自动监控系统的构建[J].中华医院管理杂志.2004,20:479-481
    [110] 郦柏平,张慧芬,王燕儿,等.基于 HIS 的电子药历的开发与应用[J].中华医院管理杂志.2004,20:748-750
    [111] 曹晋军.基于 HIS 的循证医学支持系统的设想[J].中华医院管理杂志.2004,20:487-488
    [112] 吴伟斌,肖强,陈联忠,等.电子病历系统的研究与开发[J].中华医院管理杂志.2004,20:204-206
    [113] 郦柏平,方曙,张慧芬,等.拓展医院信息系统功能提升药学服务和管理水平[J].中华医院管理杂志.2004,20:238-240
    [114] 金哲锋,李冀南,王明叶.杭州市基本医疗保险门诊违规情况分析及对策[J].中国卫生经济.2003,22(3):39-40
    [115] Fayyad U. The KDD process for extracting useful knowledge from volumes of data[J].Communication of the ACM.1996,39(11):27-34
    [116] 刘芳,胡和平.半结构化数据的模式发现[J].微型电脑应用.2000,16(2):13-15
    [117] Chatfield C. Model uncertainty, data mining and statistical inference [J].J Roy Statist Soc. A Part. 1995,(3):419-466
    [118] Glymour C, Madigan D, Pregibon D, et al. Statistical inference and data mining[J].Communication of the ACM.1996,39(11):35-41
    [119] Mark B. Data mining-here we go again[J].IEEE Transaction,Expert.1996,ll(5):18-19
    [120] 李斌.金融时间序列挖掘关键算法研究[D].合肥:中国科学技术大学,2002
    [121] 胡桔州,兰秋军.金融时间序列的数据挖掘技术与经典统计模型的比较[J].系统工程.2005,23(6):95-98
    [122] 来升强,朱建平.数据挖掘中高维定性数据的粗糙集聚类统计研究[J].2005,8(8):56-60
    [123] Heckerman D. Bayesian networks for data mining[J].Data Mining and Knowledge Discovery.1997,1:79-119
    [124] Apte CH, Weiss S. Data mining with decision trees and decisionRules[J].Future Generation Computer Systems Elsevier.1997,13:197
    [125] 林士敏,田风占,陆玉昌.贝叶斯学习、贝叶斯网络与数据采掘[J].计算机科学.2000,27(10):69-72
    [126] 王实,高文.数据挖掘中的聚类方法[J].计算机科学.2000,27(4):42-45
    [127] Leung Y, Ma J H, Zhang W X.A new method for mining regression classes in large data sets[J].Trans Pattern Analysis and Machine Intelligence.2001,23(1):5-21
    [128] 吕安民,李成名,林宗坚,等.基于统计归纳学习的 GIS 属性数据挖掘[J].测绘学院学报.2001,18(4):290-293
    [129] Hosking J R M, Pednault E P D, Sudan M.A statistical perspective on data mining[J].Future Generation Computer Systems.1997,13:117-134
    [130] Fayyad U, Stolorz P. Data mining and KDD: promise and challenges [J].Future Generation Computer Systems.1997,13,99-115
    [131] Hand D J. Data mining: statistics and more[J].The American Statistician.1998,52(2):112-118
    [132] Bailey R A. Journal of the Royal Statistica1 Society [J]. Series D, The Statistician.1998,47:261-271
    [133] Hand D J. Statistics and data mining: intersecting disciplines[J].ACM SIGKDD Explorations.1999,1(1):16-19
    [134] Friedman J H. The role of statistics in the data revolution? [J].International Statistical Review.2001,69:5-10
    [135] 马江洪,张文修,徐宗本.数据挖掘与数据库知识发现:统计学的观点[J].工程数学学报.2002,19(1):1-13
    [136] 韩明.数据控制及其对统计学的挑战[J].统计研究.2001,(8):55-57
    [137] 李经振.数据挖掘:统计学的分支[J].江苏统计.2002,(9):7-8
    [138] 行智国.统计学与数据挖掘的比较分析[J].统计教育.2002,(6):6-8
    [139] 孙薇斌 . 数 据 挖 掘 中 统 计 方 法 的 作 用 和 问 题 点 [J]. 数 理 统 计 与 管理.2004,23(5):78-80
    [140] SPSS Inc. Headquarters http://www.spss.com/success/template_view.cfm?Story_ID= 137/2007-0417
    [141] ASA Institute Inc http://www.sas.com/success/trinity.html/2007-04-17
    [142] ASA Institute Inc http://www.sas.com/offices/asiapacific/china/success/industry.html/2003
    [143] 张辉,李军,钱宗才,等.基于数据挖掘技术的骨肿瘤诊断知识的自动获取[J].第四军医大学学报.2004,25(7):669-670
    [144] Shimao T. Tuberculosis and its control-lessons from the past and future prospect[J].Kekkaku.2005,80(6):481-489
    [145] Aoki M. Tuberculosis control strategy in the 21st century in Japan-for elimination of tuberculosis in Japan[J].Kekkaku.2001,76(7):549-57
    [146] Porter JD, McAdam KP. The re-emergence of tuberculosis[J].Ann Rev Public Health.1994,15:303-323
    [147] Ohmori M. The estimates of the future trend of tuberculosis incidence and the control programs for its elimination[J].Kekkaku.1995,70(1):41-47
    [148] Ohmori M. Commemorative lecture of receiving Imamura Memorial Prize. III. Estimating the year of eradication of tuberculosis in Japan [J]. Kekkaku.1994,69(9):575-579
    [149] Ohmori M. Estimating the year of eradication of tuberculosis in Japan [J]. Kekkaku.1991,66(12):819-828
    [150] Sudre P, Ten Dam G, Kochi A. Tuberculosis:a global overview of the situation today[J].Bull World Health Organ.1992,70(2):149-59
    [151] 龚幼龙,赵丰曾,万刺亚,等.中国结核病控制项目的社会经济学评价[J].健康教育.2001,18:13-23
    [152] Azuma Y.A simple simulation model of tuberculosis epidemiology for use without large-scale computers[J].Bull World Health Organ.1975,52:313-322
    [153] 王黎霞,施鸿生,刘胜安,等.我国结核病流行病学模型及疫情态势浅析[J].中国卫生统计.1994,11(4):18-21
    [154] 蒋兴国,张荣运.运用数学模型估算宁夏结核病患病率的探讨[J].宁夏医学院学报.1997,19(2):93-94
    [155] 端木宏谨,万利亚,陈启明,等.2001-2010 年我国肺结核病患病预测的研究[J].中华医学杂志,2003,83(8):641-643
    [156] 贺晓新,金水高,张立兴,等.结核病疫情发展预测模型的建立及评价[J].中国防痨杂志.2004,26(1):lO-15
    [157] 胡泰山,郭文玉,许晨,等.用递推式联立方程估算我国结核病患病率的探讨[J].中国防痨杂志.1994,16:152-154
    [158] 董新永,王成森,于永峰.淄川区肺结核病疫情动态和预测[J].职业与健康.2001,17(10):91-93
    [159] Loytonen M, Maasilta P. Multi-drug resistant tuberculosis in Finland-a forecast[J].Soc Sci Med.1998,46(6):695-702
    [160] Antunes JL, Waldman EA. Tuberculosis in the twentieth century: time-series mortality in Sao Paulo,Brazil,1900-97[J].Cad Saudi Publica.1999,15(3):463-476
    [161] Dye C, Fengzeng Z, Scheele S, et al. Evaluating the impact of tuberculosis control: number of deaths prevented by short-course chemotherapy in China [J]. Int J Epidemiol.2000,29(3):558-564
    [162] Bermejo A, Veeken H, Berra A. Tuberculosis incidence in developing countries with high prevalence of HIV infection[J].AIDS.1992,6(10):1203-1206
    [163] Cote TR, Nelson MR, Anderson SP, et al. The present and the future of AIDS and tuberculosis in Illinois[J].Am J Public Health.1990,80(8):950-953
    [164] Nishiura H, Patanarapelert K, Tang IM. Predicting the future trend of drug-resistant tuberculosis in Thailand: assessing the impact of control strategies[J].Southeast Asian J Trop Med Public Health.2004,35(3):649-656
    [165] Fang Y.A study on the trend of tuberculosis in an area of Shanghai City using mathematical model[J].Nippon Eiseigaku Zasshi.2000,55(3):552-558
    [166] Rios M, Garcia JM, Sanchez JA, et al. A statistical analysis of the seasonality in pulmonary tuberculosis [J]. Eur J Epidemiol.2000,16(5):483-488
    [167] West RW, Thompson JR. Modeling the impact of HIV on the spread of tuberculosis in the United States[J].Math Biosci.1997,143(1):35-60
    [168] 黄彩,倪少凯.深圳市结核病发病率的指数趋势曲线预测[J].广东医学院学报 2003,21(3):302-303
    [169] 卢叶松,宋宏森.灰色数列预测模型在结核病控制工作中的应用[J].中国防痨杂志.1993,15(4):158-159
    [170] 王成科.数学模型在防痨中的应用[J].数理医药学杂志.2002,15(3):199-200
    [171] 张祖国,李仁义.应用寿命表法和灰色模型分析结试阳转维持率[J].数理统计与管理.11(8):23-25
    [172] 徐毅.肺结核的疫情分析与预测[J].中国预防医学杂志.2004,5(3):201-202
    [173] 黄春萍,倪宗瓒.灰色模型在预测肺结核发病率中的应用[J].现代预防医学.2002,9(6):791-793
    [174] 王行钵,朱伯相.连云港市肺结核病疫情动态分析与预测[J].职业与健康.2001,17(11):106-107
    [175] 金永富,赵玉婉,裘炯良,等.舟山结核病流行趋势的灰色模型分析[J].The morbidity analysis and trend prediction of tuberculosis in Zhoushan Islands with the Grey Model method Chin[J].Dis Control Prev.2003,7(5):472-473
    [176] 王秀红.新沂市结核病患病率的灰色预测[J].中国校医.1996,10(1):29-31
    [177] 喻家田.预测肺结核病人住院人次的灰色动态 DM(1,1)模型[J].中国防痨通讯.1989,11(2):87-88
    [178] 朴后男,崔兴烈.延边地区 1075 例乳腺癌统计分析[J].中华临床医学研究杂志.2005,11(21):3168-3169
    [179] 周瑾,段清玉.滕州市乳腺癌发病及分型的统计分析[J].中国社区医师.2005,21(3):28
    [180] 张 汉 荣 , 崔 莲 姬 , 崔 兴 烈 .689 例 乳 腺 癌 统 计 分 析 [J]. 吉 林 医学.2003,24(1):78-79
    [181] 袁素,吴晓华,吴兰,等.1883 例乳腺肿块回顾性分析[J].中华肿瘤临床与康复.2003,10(6):499-500
    [182] 王伟平.彩色多普勒超声对乳腺癌腋窝淋巴结转移的诊断价值[J].中华医护杂志.2006,3(3):245-246
    [183] 季晓昕,骆成玉,张键,等.乳腺癌新辅助化疗后腋窝淋巴结的变化[J].中国微创外科杂志.2006,6(4):272-274
    [184] Veronesi U, Paganelli V, Viale G, et a1. Sentinel-lymph-node biopsy as a staging procedure in breast cancer: update of a randomized controlled study. Lancet Oncol. 2006,7(12):983-990
    [185] 郭静,贺巍.乳腺癌前哨淋巴结活检的临床应用[J].新疆医学.2005,35:38-40
    [186] 徐文通 , 李荣 . 乳腺癌腋窝淋巴结转移的分析 [J]. 中国现代医学杂志.2005,15(7):1029-1031
    [187] 胡惠芳,钱秀珍,凌雯心.术前乳腺癌腋窝淋巴结转移评估方法的比较[J].南通大学学报( 医学版).2005,25(5):359-361
    [188] 苏新良,吴凯南.B 超检查预测乳腺癌腋淋巴结转移的价值[J].重庆医学.2004,33(2):244-245
    [189] 许羽,李小兰.乳腺癌肿块大小与腋窝淋巴结的关系[J].中华中西医学杂志.2004,2(4):13-16
    [190] Dabbs DJ, Fung M, Landsittle D, et al. Sentinel lymph node micrometastasis as a predictor of axillary tumor burden. Breast J. 2004 Mar-Apr;10(2):101-105
    [191] Changsri C, Prakash C, Sandweiss L, et al. Prediction of additional axillary metastasis of breast cancer following sentinel lymph node surgery. Breast J. 2004 Sep-Oct;10(5):392-397
    [192] de Widt-Levert, Tjan-Heijnen, Bult P, et al. Stage migration in breast cancer: surgical decisions concerning isolated tumour cells and micro-metastases in the sentinel lymph node. Eur J Surg Oncol. 2003 Apr;29(3):216-220
    [193] 康利克,沈桂新.腋窝转移性淋巴结的超声图像对乳癌诊断的价值[J].医学文选.1999,18(4):560-561
    [194] 肖晖,许怀谨.腋下淋巴结阴性乳腺癌的预后因素及辅助治疗[J].国外医学外科学分册.1998,25(4):212-214
    [195] 冉立,文小平.腋窝淋巴结阴性乳腺癌复发的高危因素[J].实用肿瘤杂志.1998,13(3):152-153
    [196] 毛利锋,瞿海斌.一种基于决策树的乳腺癌计算机辅助诊断新方法[J].江南大学学报(自然科学版).2004,3(3):227-229
    [197] Warren S. Sarle. What are cross-validation and bootstrapping?[J]http://www.faqs.org/faqs/ai-faq/neural-nets/part3/section-12.html/ 2002-05-17
    [198] 孙拯球主编.医学统计学[M].北京:人民卫生出版社,2002.257-270
    [199] 谷琼,朱莉,蔡之华,等.基于决策树技术的高校研究生信息库数据挖掘研究[J].电子技术应用.2006,1:20-22
    [200] 王曙燕,耿国华,李丙春.决策树算法在医学图像数据挖掘中的应用[J].西北大学学报(自然科学版).2005,35(3):262-265
    [201] 潘永生,庄天戈.决策树算法及其在乳腺疾病图像数据挖掘中的应用[J].计算机应用研究.2002,9:78-79
    [202] Quinlan J R. Induction of decision tree[J].Machine Learning.1986,(1):8l-l06
    [203] Kass G.V. An exploratory technique for investigating large quantities of categorical data[J].Applied Statistics.1980,29(2):119-127
    [204] 王广州.AID 和 CHAID 在多变量市场细分中的应用[J].研究市场与人口分析[J].1999,5(6):16-19
    [205] 石玲,王燕.婴幼儿死亡危险因素的研究—兼论 CHAID 方法的原理及应用[J].中国卫生统计.2002,19(5):283-285
    [206] Glenn D, Katharina E. Classification and regression trees: A powerful yet simple technique for ecological data analysis [J].Ecology.2000, 81(11):3178-3192
    [207] Meyer c L, Berger P J, Koehlert K J. Interactions among factors affecting stillbirths in Holstein cattle in the United States[J].Journal of Dairy Science.2000,83(11):2657-2663
    [208] Spratt K F, Keller T S, Szpalski M, et al. A predictive model for outcome after conservative decompression surgery for lumbar spinal stenosis[J].Eur Spine [J].2004,13(1):14-21
    [209] Uter W, pfahlberg A, Kalina B, et al. Inter-relation between variablesdetermining constitutional UV sensitivity in Caucasian children[J]. Photodermatol Photoimmunol Photomed.2004,20(1):9-13
    [210] Park J, Sandberg J W. Universal approximation using radial basis functions network[J].Neural Comput.1991,3:246-257
    [211] Hartman E J, Keeler J D, Kowalski J M. ayered neural networks with guasian hidden units as universal approximations[J].Neural Comput.1990,2:210-215
    [212] Poggio T, Girosi F. Networks for approximation and learning[J].Proc of the IEEE.1990,78(9): 1481-1497
    [213] Janghorbani MB. Incidence of and risk factors for ataract among diabetes clinic attenders [J].ophthalmic Epldenfio1.2000,7(1):13-25
    [214] American Diabetes Association. Hypertension management in adults with diabetes[J].Diabetes Care.2004,27:65-67
    [215] Haffner SM, Lento S, Ronnemaa T, et a1.Mortality from coronary heart disease in subjects with Type 2 diabetes and in non-diabetic subjects with and without prior myocardial infarction[J].N Engl J Med.1998,339:229-234
    [216] 李利,王军,姜宏卫,等.糖尿病并发症研究进展-第 6l 届 ADA 年会纪要[J].国外医学内分泌学分册.2002,22(1):52-54
    [217] 陈佳,马晓伟.正在进行哪些预防糖尿病及其并发症的研究[J].中华糖尿病杂志.2005,13(4):311
    [218] Zafra Mczcua JA. Chronic complications in patients with type 2 diabetes mellitus cared for at a health center[J].Aten Primana, 2000,25(8):529-535
    [219] 陈 名 道 . 波 动 性 高 血 糖 与 糖 尿 病 并 发 症 [J]. 国 际 内 分 泌 代 谢 杂志.2006,26(5):312-314
    [220] 李玉梅,邓永萍,黄绍宽.老年高血压与糖尿病慢性并发症的关系[J].血管康复医学杂志.2006,15(3):268-270
    [221] 周琳,高方,薛耀明.老年糖尿病合并多脏器结核 1 例报告[J].第一军医大学学报.2002,22(9):858
    [222] 蔡 若 新 . 糖 尿 病 并 发 肺 结 核 52 例 临 床 分 析 [J]. 中 国 现 代 医 药 杂志.2006,8(4):65
    [223] 张志将,吴元民,江萍,等.社区自然人群中糖尿病慢性并发症危险因素的探索研究[J].中国全科医学.2001,4(12):970-972
    [224] 范建生,周隆佾,蒋德勇.2 型糖尿病并发症的病例对照研究[J].社区医学杂志.2006,4(8):11-13
    [225] 王桂兰,陈丽琴.糖尿病性心脏病相关危险因素调查[J].社区医学杂志.2006,4(8):18-19
    [226] 焦方惠.糖尿病性冠心病心肌缺血的动态心电图分析[J].社区医学杂志.2006,4(8):39-40
    [227] 赵智翔,许敏达,陈松岳,等.糖尿病影响听力的相关因素分析[J].浙江临床医学.20O6,8(8):817
    [228] 王玉珍,赵德明,许樟荣,等.糖尿病合并大血管病变的危险性研究-4845 例糖尿 病 患 者 合 并 慢 性 并 发 症 及 治 疗 现 状 调 查 [J]. 中 国 糖 尿 病 杂志.2006,l4(3):197-200
    [229] 靖涛,卢娜.老年糖尿病下肢动脉病变因素分析[J].现代中西医结合杂志.2006,15(9):1153-1154
    [230] Agrawal R. Database Ming: A Performance Prospective[J].IEEE Transaction on knowledge and data engineering.1993,5:914-925
    [231] 胡吉明,鲜学丰.挖掘关联规则中 Apriori 算法的研究与改进[J].计算机技术与发展.2006,16(4):99-104
    [232] 罗可,贺才望.基于 Apriori 算法改进的关联规则提取算法[J].计算机与数字工程.2006,34(4):48-55

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700