分类方法在中医辨证诊断应用中的比较研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
研究背景:
     中医研究领域中,辨证是中医学的核心,也是确保疗效的前提。为了研究中医辨证分类规律,流行病学方法、多元统计方法、机器学习、神经网络等多种方法被引入了研究之中,从而形成了百家争鸣的场面。
     然而,不同方法会产生不同的分类器,分类器的优劣直接影响数据挖掘的效率与准确性。目前许多涉及数据分析/挖掘方法在中医辨证诊断中应用的研究多局限于研究的方法本身,尚未涉及各种典型数据分析/挖掘方法,较为全面、深入的横向比较;再者,模型评价的方法使用混乱,不规范,因此难以避免出现管中窥豹,只见一斑的片面评价。如何正确评价各种分类方法在中医辨证研究中的应用价值,以及各自的优缺点,以期在分类方法的选择上做出指导,是中医现代化多学科研究中方法学合理应用的前提,是一个有广泛应用前景的研究方向。
     原发性失眠症的证治规律探讨是目前临床研究的热点,在方法学应用上也千法并举,莫衷一是。本研究以该病为切入点,搭建数据平台。在结合统计学预处理及基于相关性分析、主成分分析、粗糙集方法的属性约简预处理的基础上,应用分类方法中统计方法、机器学习方法及神经网络方法中的典型代表方法:Logistic回归、贝叶斯分类器法、基于规则的分类方法(PARI)、C4.5决策树方法、BP、RBF神经网络方法,并引入概率神经网络方法、支持向量机方法,对原发性失眠的临床数据进行中医辨证分类的研究,以期对各种方法进行横向比较,评估其应用于中医证候分类研究的价值,提出符合中医数据类型特点的数据约简方法、分类方法和模型评价方法。
     目的:
     1应用支持向量机、概率神经网络方法建立原发性失眠症中医辨证分类模型,评估其应用于中医证候分类研究的价值,并与其他几种常用分类方法比较,分析比较各种算法的特性,评价其优劣。
     2比较评估3种属性约简方法(基于相关性分析、主成分分析、粗糙集方法的属性约简)在中医证候数据处理中的应用价值。
     方法:
     本研究为横断面调查。根据国内外有关原发性失眠的研究报道、中医理论,建立了包括西医量表及中医证候调查表的《失眠症临床观察表》,调查广州中医药大学第二附属医院大德路总院、芳村分院,神经科内科门诊或睡眠心理专科就诊的原发性失眠病患者。
     根据观察表的内容应用Epidata4.1a建立数据库录入数据,经过填补缺失值、离散化、归一化等数据预处理后,分别用SPSS13.0中相关性分析(采用Spearman相关系数法计算相关性,并删除相关系数的P值大于0.05的变量)、主成分分析法(筛选特征根>1,公因子方差>0.4的证候信息)和Rosetta软件中基于粗糙集的属性约简方法(基于差别矩阵的粗糙集属性约简)进行数据约简(降维)。
     采用改进的样本划分法,按照5:1的比例(450例/92例)将数据库进行分割,取随机数字前92例形成验证集,余450例为训练集。
     然后分别对三种约简方法得到训练数据集进行如下建模:Logistic回归(Forward LR模型、Backward LR模型)采用SPSS13.0分析,贝叶斯分类、基于规则的分类器(PARI)、C4.5决策树方法采用WEKA3.5.7软件,BP神经网络、RBF神经网络、概率神经网络方法采用MATLAB7.0软件的神经网络工具箱,支持向量机方法(多项式核函数模型、径向基核函数模型、Sigmoid核函数模型)采用LIBSVM2.85软件完成。
     对训练集,分别采用自身回代验证、5倍交叉验证方法对所建立的模型的拟合效果和分类效果进行评价,主要评估指标包括:敏感度、特异度、准确度、漏诊率、误诊率、Youden指数、阳性预测值、阴性预测值、阳性似然比、阴性似然比、一致性检验(Kappa值)、ROC曲线。
     然后,利用验证数据对模型进行预测性能的前瞻性评价,评价指标:准确率、Kappa值、平均绝对误差、均方根误差。
     三种约简方法之间的比较主要评估指标有:属性蒸发率、构建模型的计算开销和模型复杂度、所构建模型的分类性能和预测性能。
     通过上述指标,评价三种约简方法之间以及各种二分类分类器之间的优劣。
     结果:
     共收集了原发性失眠病患者共414例,其中128例完成了两个时点的观察,286例完成了一个时点的观察,以时点为横断面,共采集证候断面资料542个,资料之中存在证型重叠。其中肝郁化火证最多,共183例,我们以肝郁化火型为例进行分类器的构建。
     1原始自变量(包括PSQI指标、症状、体征,除外舌淡红、苔薄白)共95个,结果相关性约简的结果得到包含55个属性的子集,主成分约简方法得到包含33个属性的子集,而粗糙集约简方法得到的子集规模最小,仅包含19个属性。属性蒸发率分别为42.105%、40.000%和65.455%,以粗糙集约简方法最高,由其构建的各种模型效果均优于主成分约简模型,优于或与相关性约简模型相仿。
     2无论哪种模型,自身回代验证的正确率都高于交叉验证的结果,甚至有的模型可相差接近20%的概率。而进一步使用高自身验证准确率的模型来进行验证集预测时,正确率却明显降低。
     3 Logistic回归模型:拟合的Backward LR模型各项指标优于Forward LR模型或与之相似,三种约简方法结果所构建的Logistic向前和向后模型,其5折交叉验证ROC曲线下面积差异均无统计学意义。三种约简方法结果所构建的Logistic向后模型,5折交叉验证平均分类正确率为86.222%,ROC曲线下面积平均为0.904,三者差异无统计学意义,平均预测正确率为89.855%。
     4贝叶斯分类器:三种约简方法结果所构建的贝叶斯分类器,其5折交叉验证分类正确率在79.111%~87.556%之间,平均84.148%,5折交叉验证ROC曲线下面积平均为0.895,相关性及粗糙集约简结果所构建模型与主成分约简结果模型比较差异有显著性意义,预测准确率在83.696%~92.391%之间,平均89.130%。
     5基于规则的分类器:三种约简结果构建的模型分别建立了5、4、5条规则。规则对训练集案例的覆盖率均较低。自身回代验证结果与5折交叉验证结果相差较大。三种约简结果构建的模型,其5折交叉验证分类正确率在77.778%~87.556%之间,波动较大,平均为83.037%,ROC曲线下面积平均为0.829,相关性及粗糙集约简结果所构建模型与主成分约简结果模型比较差异有显著性意义。预测正确率79.348%~91.304%,平均85.507%。
     6 C4.5决策树:三种约简结果构建的模型分别建立了含有15、12、10个节点数的决策树模型,训练较快速。但三种模型均只覆盖了若条件成立则阳性结果成立的属性,总体分类能力一般,分类正确率在85%左右波动,5折交叉验证ROC曲线下面积平均约0.834,其中,粗糙集约简结果模型优于其它两种约简结果模型,差异有统计学意义。预测正确率在83.696%~89.130%之间,平均86.957%。
     7支持向量机:三种核函数模型中,径向基核函数模型分类效果最好,各项指标均优于其它两种核函数模型,其5折交叉验证ROC曲线下面积与Sigmoid核函数模型比较差异有显著性意义,而其支持向量的数量也较少。进行参数寻优后正确率明显提高。相关性约简结果建模分类预测准确率可以达到100%,其它两种约简结果建模分类正确率分别为88.222%、92.222%。5折交叉验证ROC曲线下面积在0.94以上,粗糙集约简结果模型与主成分约简结果模型比较差异有显著性意义。预测正确率在92%以上。
     8 BP网络:三种约简结果构建的模型分别建立了含有4、3、5个隐节点的BP网络。参数设置较耗时。三种约简结果构建的模型分类正确率在81.778%~89.111%之间,平均85.185%。ROC曲线下面积平均为0.889,其中相关性约简结果优于其它两种约简结果模型,差异有统计学意义。预测正确率波动较大,在73.913%~95.652%之间,平均86.594%,预测误差较大。
     9 RBF神经网络:三种约简结果构建的模型各自建立了含有3个隐节点的RBF网络。学习速度较BP神经网络快,参数设置较简单,三种约简结果构建的模型,5折交叉验证平均分类正确率88.741%。5折交叉验证ROC曲线下面积在0.89以上,三种模型两两之间比较差异均有统计学意义。预测正确率平均为90.217%。
     10 PNN神经网络:参数少,运行速度快。5折交叉验证中分类正确率均在86%以上,甚至接近95%,平均为91.111%。5折交叉验证ROC曲线下面积在0.93以上,平均为0.967,其中,主成分约简结果模型差于其它两种约简结果模型,差异有显著性意义。预测准确率均高于90%,平均为93.840%。
     11根据5折交叉验证AUC曲线下面积大小,结合假设检验结果,将8种模型进行分类效能划分:
     相关性约简结果建模:SVM>PNN>Logistic、RBF>PARI、BP、C4.5,而Bayes与后两类模型比较差异均无显著性意义,故应介于3、4类之间。
     主成分约简结果建模:SVM、PNN>RBF、Bayes>C4.5、PARI,而Logistic、BP与RBF、Bayes、C4.5比较差异均无显著性意义,故介于2、3类间。
     粗糙集约简结果建模:PNN>SVM>Bayes、Logistic、BP、C4.5>PARI,而RBF与PNN、SVM比较差异均无显著性意义,故介于1、2类之间。
     结论:
     1粗糙集的属性约简方法能在保持较高质量分类能力的基础上,尽量消除信息系统(决策表)中不必要的知识,得到对证型有较好的分类能力的较小属性集合,一种值得在中医证候数据处理中推广应用的约简方法。
     2自身回代验证容易高估分类判别的效果,因此实用价值不大,不适于用于客观评价模型效果。而5折交叉验证的结果较稳定,能反映所建立的分类模型的真实分类能力,尤其是对存在干扰的情况下,它能很好的避免分类结果出现较大的波动。建议在今后的研究中尽量采用交叉验证的方法对模型的分类效能进行客观的评价。
     3与传统的评价指标相比,ROC曲线具有可信度高,描述客观精确,特别是不受数据环境影响等优势,并且能够对两个诊断试验的曲线下面积进行假设检验,结果更直观、客观。
     2.总体而言,应用的8种模型均有一定诊断价值,其中SVM、PNN、RBF最佳,Logistic、贝叶斯分类器、BP次之,C4.5、PARI较一般。
     3.Logistic回归模型的评价体系、模型修正与诊断较完善,可以清楚的显示各个自变量在模型中贡献的大小以及作用的方向。但容易受中医证候资料中共线性及强影响点等影响,其预测正确率及误差在8种模型中均处于中等位次。其中Backward LR构建的模型稍优于Forward LR构建的模型,考虑Backward LR法在筛选变量时侧重于向模型中引入联合作用较强的变量,因此对于普遍存在相关性的中医证候数据而言,建议采用Backward LR法构建模型。
     4贝叶斯分类器容易受频数及先验概率影响,分类效果与Logistic回归相仿。
     5基于规则的分类器可以产生易于理解的规则以及各规则的强度,但模型分类、预测能力均较差,稳健性较差,因此该模型适于用来抽取规则帮助理解中医证候内涵,但不适于用于分类和预测研究。
     6 C4.5决策树产生可视化树状图,有助于直观理解各属性在证候判别中的作用大小,对强影响点的干扰具有较好的鲁棒性,但模型敏感度、误诊率、阴性预测值、阴性似然比较低,而漏诊率、特异度、阳性预测值、阳性似然比较高,分类能力一般,预测误差较大。我们认为该模型适于用来形成决策树,帮助直观理解中医证候内涵,但不适于用于分类和预测研究。
     7支持向量机中径向基核函数模型较适于对中医证候研究数据分析,其分类效果及预测精度均优于多项式、Sigmoid核函数,而支持向量的数量也较少,泛化性好,。因此采用SVM进行中医证候分类研究时,RBF核是一个比较好的选择。SVM可以对中医证候数据构建一个最优超平面,使得非线性可分的中医证候数据在特征空间中得到准确率较高的划分,其分类效果优于其它分类器,而且模型有较好的鲁棒性、泛化能力较好。将SVM技术引入中医证候研究是可行而且有效的。
     8 BP网络用于中医证候诊断方面学习速度慢、泛化能力差、易陷入局部极小,且中医证候的特征矢量很难获得,证候的诊断准确率不高,因此实际作用较差,推广较难。
     9 RBF神经网络学习速度较BP神经网络快,参数设置较简单,对中医证候数据有较好的识别分类能力和预测性能,模型较稳健,是一种适用于中医证候研究的方法。
     10 PNN神经网络参数少,运行速度快,模型较稳健,分类效果及预测精度均较高,仅次于SVM,泛化性能较好,能很好地识别中医证候数据中的分类信息,较理想地完成证候分类及预测的工作,是值得在中医证候分类研究中推广的技术。
Backgroud:
     In the field of Traditional Chinese Medicine research, Differentiation is the core of it and the precondition to ensure efficacy. In order to study the classification rule of TCM, epidemiological methods, multivariate statistical methods, machine learning, neural networks, and also many kinds of other methods have been introduced into the study, which formed a extensive contend scenes.
     However, different methods can produce different sorters, the quality of the sorters have direct influence on the efficiency and the accuracy of data mining. At present, most research on the application of data analysis/mining methods in TCM Differentiation limit to the research method which is used, more comprehensive crosswise comparison among every kind of typical data analysis/mining methods has not yet been involved. Furthermore, the use of the model evaluation methods is derangement and irregularity. Therefore it is difficult to avoid partial view. How to correctly evaluate the value of the application of each classified methods in TCM Differentiation research, as well as respective disadvantage and merit, for making a instruction in the choice of classified method, is the prerequisite for reasonable application of methods in TCM modernizational multi-disciplinary research and has the extensive prospect for future research.
     The discussion of Differentiation rules in primary insomnia is one of the focuses in the present clinical research. And the application of methods is also in the same situation. This research takes it as an investigation object and collects the relational clinical data. And on this data platform, first we carry on a attribute reduction respectively based on statistics processing and rough sets method. Then with the application of typical classification methods in statistical methods, the machine learning methods and the neural network methods: the Logistic regression , the Bayesian Classifier, rule-based classified method, the C4.5 decision tree, BP, RBF neural network method, and also the probability neural network method, the support vector machines method, we perform the primary insomnia clinical TCM data classification research. And we carry on the crosswise comparison among each foregoing method and assessment of the value on their application in TCM Syndrome Classification. By this means, we discuss the data reduction, classification and model evaluation methods which meet the characteristics of the TCM data.
     Objective:
     1 Etablish classification models of Pathogenic fire derived from stagnation of liver-QI of primary insomnia with support vector machine, probabilistic neural network method. And assess its application value for TCM syndrome classification, And compared with several other commonly used classification methods, evaluate their characteristics.
     2 With the comparison of 3 attribute reduction methods (separately based on the correlation analysis, principal component analysis, rough set methods), assess their application value for data processing of applications in TCM syndrome research.
     Method:
     This study is a cross-sectional survey. According to relative domestic and foreign research report and TCM theory about primary insomnia, we establish "Insomnia clinical observation questionnaire", including Western medicine scales and Chinese medicine syndromes questionnaire, through which we investigate the primary insomnia out-patients in Guangdong Province Hospital of TCM.
     According to the content of the questionnaire, Epidata4.1a was used to the establishment of a database. After data processing such as filling missing values, discretization and normalization, bivariate correlation analysis(spearman correlation coefficient was used and the attributes which P value was above 0.05 were filtered), principal component analysis(attributes which eigenvalues was above 1 and communality was above 0.4 were extracted) by SPSS 13.0 and rough set(ROSETTA software) were respectively performed for attribute reduction (reduced-dimension).
     Database was split into two parts by the improved sample division method in accordance with the ratio of 5:1 (450 cases / 92 cases). Cases with random number from 0 to 92 were into test set, the other were into the training set. Then the relative models of three reduction training database were built by follows methods: Logistic Regression (Forward LR model and Backward LR model) by SPSS 13.0 software, Bayesian classifier, rule-based classification (PARI), C4. 5 decision tree method by WEKA3.5.7 software, BP neural network, RBF neural networks, probabilistic neural network method by MATLAB7.0 software neural network toolbox, and Support Vector Machine (polynomial kernel model, radial basis function kernel model and Sigmoid kernel model) by LBSVM 2.85 software.
     For the training set, original and five-fold cross-validation method were used to evaluate the goodness of fit and the classified effect of the established models. The major assessment index include sensitivity, specificity, accuracy, the rate of missed diagnosis, the rate of misdiagnosis, Youden index, positive predictive value, negative predictive value, positive likelihood ratio, negative likelihood ratio, consistency test (Kappa values) and the ROC curve.
     Then, the models were used to predict the classification results of the test set for prospective evaluation with index included accuracy, Kappa, the average absolute error, the root mean squared error.
     Indicators applied to assess three attribute reduction methods included attribute evaporation rate, the calculation complexity and model complexity, the classification and prediction performance of models.
     Through all these index we estimated the pros and cons of three reduction methods and two-categories classified models.
     Result:
     414 cases of primary insomnia patients were enrolled. 128 of which completed twice observation, 286 cases completed one observation. Taken the observation time, 542 data of Syndrome were collected with overlapping syndromes. The most syndrome is Pathogenic fire derived from stagnation of liver-QI which up to 183 cases. And we used it as an example to build the sorter.
     1 The original variables (including PSQI, symptoms, signs, except for light red tongue and thin whitish fur) is up to 95. The result of the reduction by bivariate correlation analysis is a subset with 55 attributes. Principal component reduction result in a 33 attribute subset and the subset reduced by rough set was the smallest, containing only 19 attributes with the highest attribute evaporation rates (65.455%). The results of models constructd by it were better than principal components reduction models and better than that of the correlation analysis reduction or similar.
     2 No mater which kind of model, the accuracy of original was better than that of cross-validation, even in some model the difference reached nearly 20%. However, the further use of the model, which original test accuracy was high, showed that the results turn out to be markedly lower.
     3 Logistic regression model: The Backward LR model was superior to Forward LR model or similar in all indicators. No matter Forward or Backward model, the area under the ROC curve(AUC) in 5-fold cross-validation of models constructed by three reduction methods were no statistically significant. Their average correct classification rate was about 86.222%. The average AUC in 5-fold cross-validation was 0.904 without statistic significance. And average prediction accuracy was 89.855%.
     4 Bayesian classifier: The accuracy of Bayesian classifier set up by 3 reduction results undulated 79.111%~87.556%, average 84.148%. The average AUC in 5-fold cross-validation was 0.895, and there significant difference between models from rough or relevance reduction outcome and model from principal components reduction outcome. And average prediction accuracy was 83.696%~92.391%.
     5 Rule-based classifier: The models respectively constructed by three reductions contained 5,4 and 5 rules separately. The coverage rate of rules were all relatively low on the training set and there was a large gap between the accuracy of original test and that of 5-fold cross-validation. The average accuracy of three models constructed by three reductions was volatile between 77.778% and 87.556%, average 83.037%. The AUC in 5-fold cross-validation is above 0.829 in average, and the prediction accuracy was 89.348%~81.304%, 85.507% in average.
     6 C4.5 decision tree: the nodes of C4.5 decision trees set up by three reduction results were 15, 12 and 10 correspondingly. The training process was quickly. But three models merely covered the attributes if which was positive then the positive result turned out, so the general classification capability was mediocre. The accuracy was about 85%. The area under the ROC curve in 5-fold cross-validation was approximate 0.834 in average and that of rough set reduction model was larger than the other two models with statistic significance. The prediction accuracy was 83.696%~89.130%, and 86.957% in average.
     7 SVM: Among three kernel models, the best classification effect was from radial basis function kernel model with a overall surpass in all indications compared with other two kernel models, There was a significant difference of the AUC in 5-fold cross-validation between Sigmoid kernel model and BRF kernel model with less number of support vectors. After choice of the optimization parameters, the correct rate increased significantly. The classified accuracy of model set up by correlation analysis reduction results was up to 100%. Those of the other two models were about 88.222% and 92.222%. The AUC in 5-fold cross-validation was above 0.94 and that of rough set reduction model was significantly better than that of principle components reduction models. The prediction accuracy was above 92%.
     8 BP Network: Three BP networks respectively with 4, 3 and 5 hidden nodes were constructed on three reduction results. Parameter settings were time-consuming and the accuracy of classification and prediction were volatile with high prediction error. The accuracy of classification was 81.778%~89.111%, and 85.185% in average. The average AUC was 0.889, and that of correlation reduction model was superior significantly against the other two reduction models. The prediction accuracy was volatile obviously between 73.913% and 95.652%, and 86.594% in average.
     9 RBF neural network: Three reduction subsets respectively established RBF network with 3 hidden nodes. The learning process was faster than that of BP network, also the parameter settings were simpler. The average correct classification rate was 88.741%, The AUC in 5-fold cross-validation was above 0.89 and multiple comparisons between three reduction models were all had significant difference. The average prediction accuracy was about 90.217%.
     10 PNN neural network: The models were with less parameter, faster running speed. The classification accuracy in 5-fold cross-certification were all above 86%, even up to approximate 95%, and 91.111% in average. The average AUC in 5-fold cross-validation was more than 0.93, average 0.967, and that of principle components reduction model was lower than the other two reduction models with statistic significance. The prediction accuracy were all higher than 90%, average 93.840%.
     11 According to 5-fold cross-validation AUC and the hypothesis test results, the eight models were separated into several grades by classification performance:
     Correlation reduction models: SVM> PNN> Logistic, RBF> PARI, BP, C4.5. And Bayesian classifier had no significant difference with all models in the latter two categories, therefore it should range between 3,4 category.
     Principal component reduction models: SVM, PNN> RBF, Bayes> C4.5, PARI, And because Logistic, BP had no significant difference with RBF, Bayes and C4.5, it should be categorized between 2 and 3 grade.
     Rough set reduction model: PNN> SVM> Bayes, Logistic, BP, C4.5> PARI, And RBF had no significant difference with PNN or SVM, so it should range between 1,2 category.
     Conclusion:
     1 The models built by attribute reduction method based on rough set can maintain a high capability of classification. The reduction can eliminate unnecessary knowledge from the information system (Decision Tables) as far as possible, result in a small subset with well ability of classification. Therefore it is a worthy reduction method in TCM syndrome data processing.
     2 It is possible to overestimate the effect of classifier by original test, so its practical value isn't enough and not suitable for the objective evaluation of models. While the results of 5-fold cross-validation test are more stable and can reflect the true capacity of classification of the models, especially with the interference data. It can avoid a large volatility of the classification results. And it is recommended that in the further study the use of cross-validation test should be carried on to evaluate the classifiers objectively as far as possible.
     3 Compared with the traditional evaluation index, ROC curve has such advantages as high reliability, accurate and objective description, specially the avoidance of the impact of bad data. It can process a hypothesis test of AUC between two diagnostic tests, so its results are more intuitive and objective.
     4 Overall, the eight models which is applied in this study all have certain diagnosis value, SVM, PNN, RBF is the best, then the Logistic, Bayesian classifier. And BP, C4.5, PARI is general.
     5 Logistic regression model has a perfect evaluation, revision system, and can clearly show the magnitude and direction of contribution of each attributes in the models. But it is easy to be infected by the collinearity and strong influential point. And the prediction accuracy and its error are in the medium sequence in eight models. Backward LR model is superior against Forward LR model. And with a second though that in the variable selection Backward LR model focus on the variables which have the strong joint action, so for the TCM syndrome data that have correlation generally, Backward LR model is suggested.
     6 Bayesian classifier is vulnerable to be impact by the frequency and priori probability. Its effect is similar with Logistic regression model.
     7 Rule-based classifier can generate easy-to-understand rules and show the strength of rules at the same time. But its classification, prediction capabilities are poor with poor stability, Thus the model is suitable for extracting rules to help understand the connotation of TCM syndromes, But unfit for classification and prediction research.
     8 C4.5 decision tree can generate a visual dendrogram which helps intuitive understanding of the contribution of attributes in syndrome discrimination. And it has good robustness with strong influential point. But the sensitivity, the rate of misdiagnosis, the negative predictive value and negative likelihood are relatively low, while the rate of missed diagnosis, specificity, positive predictive value and positive likelihood are relatively high. Its classification capability is mediocre with high prediction error. We suggest that the model is suitable to form a decision tree to help intuitive understanding the connotation of TCM syndromes, but not suitable for classification and prediction research.
     9 The radial basis functions kernel model of support vector machines is quite suitable for data analysis of TCM syndrome Research with a superiority of classification and prediction accuracy against polynomial kernel model and sigmoid kernel model, and less support vectors, It has good generalization. Therefore it is quite a good option to perform a RBF kernel when carrying on a TCM syndrome classification study. SVM can construct an optimal hyperplane for TCM syndrome data, which help to obtain a demarcation with relative high accuracy for nonlinear separable TCM syndrome data in the feature space. Its classification capability is better than other classifiers with better robustness and generalization. For these reasons, SVM technology would be feasible and effective in TCM syndrome research.
     10 The learning speed of BP Network for TCM syndrome diagnosis is slow. And its generalization ability is poor. It is vulnerable to fall into local minimization problem. And the feature vectors of TCM syndromes are difficult to obtain, the syndrome diagnostic accuracy is not high enough. Therefore its actual effect is relatively poor and difficult for promoting.
     11 The learning speed of RBF neural network is faster than BP neural network with a simpler parameter setting. It is good at classification and prediction to TCM syndrome data with better robustness and is applicable to TCM syndrome research.
     12 PNN neural network has fewer parameters and faster running speed. It is quite robust. Its classification and prediction accuracy are fairly high, merely inferior to SVM. It has good generalization performance and can well recognize classification information in TCM syndrome data, sequent with ideally results of syndrome classification and prediction. So it is worth to be promoting in the TCM syndrome classification research.
引文
[1]Mehmed Kantardzic.Data Mining:Concepts,Models,Methods,and Algorithms[M].Wiley-IEEE Press,2002:2.
    [2]匡萃璋.从中医传统学术发展规律中寻求规范化的途径[J].中国医药学报,1991;6(4):53-56.
    [3]Pang-Ning Tan,Michael Steinbach,Vipin Kumar著.范明,范宏建,译.数据挖掘导论[M].北京:人民邮电出版社,2006:89.
    [4]Thomas Bayes.An essay towards solving a problem in the doctrine of Chances,Philos.Trans.R.Soe.London,1763,53,370-418;Reprinted in Biometrike,1958;45:293-315.
    [5]Pawlak Z.Rough Sets[J].International Journal of Information and Computer Science.1982;11:341-356.
    [6]Vladimir N.Vapnik.The Nature of Statistical Learning Theory.Springer-Verlag,NewYork,1995:126.
    [7]Christopher J.C.Burges.A tutorial on support vector machines for patem recognition.Data Mining and Knowledge Discovery,1998;2(2):121-167.
    [8]阎平凡,张长水.人工神经网络与模拟进化计算[M].北京:清华大学出版杜,2006:301-355.
    [9]Wan,Vincent,Campbell,William M.Support vector machines for speaker verification and identification.Neural Networks for Signal Processing - Proceedings of the IEEE Workshop,2000:775-784.
    [10]Thorsten Joachims.Learning to Classify Text Using Support Vector Machines.Dissertation,Universitaet Dortmund,February 2001.
    [11]Trotter M.W.B.,Buxton,B.F.,Holden,S.B.,Support vector machines in combinatorial chemistry[J].Measurement and Control,2001;34(8):235-239.
    [12]Van Gestel,T.,Suykens,J.A.K.,et al.Financial time series prediction using least squares support vector machines within the evidence framework[J].IEEE Transactions on Neural Networks,2001;12(4):809-821.
    [13]El-Naqa I,Yang Y,Wemick M N,et al.Support vector machine learning for detection of micro calcification in mammograms[J].Proc IEEE Int.Symp Biomedical Imaging,2002:201-204.
    [14]Land W H,Bryden M,Lo J Y,et al.Performance trade-of between ecolutionary computation(EC)/adaptive boosting(AB)hybrid and support vector machine breast cancer classification paradigms[J].Proc 2002 Congress on Evolutionary Compuration,Honolulu,HI,USA,1:187-192.
    [15]Li G-H,Yand J,Ye C,et al.Application of support vector machines on predicting degree of malignancy in brain Glioma,in review,Computer Science 2002.
    [16]Burbidge R,Trotter M,Buxton B,et al.Drug design by machine learning:support vector machines for pharmaceutical data analysis[J].Computer and Chemistry,2001;26(1):5-14.
    [17]Soinov LA,Krestyaninova M A,BrazmaA.Genome Biology,2003,4(1):Research 6.
    [18]Quinlan JR.Hiduetion of decision trees,MaehineLeaming,1986;(1):81-106.
    [19]Quinlan,J.R.C4.5:Programs for Machine Learning[M],Morgan Kaumann Publishers Inc,1993:47-68.
    [20]J.H.Holland.Adaptation Progress in Theoretical Biology,R.Rosen and F.M.Snell,Academic Press,New York,1976:263-293.
    [21]Goldberg D E.Genetic Algorithms in Search,Optimization,and Machine Learning[J].Addison-Wesley.1989:56.
    [22]Aleksander I,Morton H.An Introdution to Neural Computing,Chapman&Hall[M],London,1990:213.
    [23]Moody J,Darken C.Fast learning in networks of locally-tuned processing units[J].Neural Computation,1989;1(2):271-294.
    [24]Specht,Donald.Probabilistic Neural Networks[J].Neural Network,1990(3):109-118.
    [25]Masters T.Advanced Algorithm for Neural Network[M].Wiley,New York,1995.
    [26]吕文娟,薛春霞,陈兴国,等.基于概率神经网络的中草药活性组分利尿性的QSAR 究[J].兰州大学学报:自然科学版,2006;42(2):72-76.
    [27]Ye C Z,Yand J,Geng D Y,et al.Fuzzy Rules to Predict Degree of Malignancy in brain Glioma[J].Medical & Biological Computing and Engineering,2002;40:145-152.
    [28]Fukuda H,Ebara M,Kobayashi A.An image analyzing system using an artificial neural network for evaluating the parenchymal echo pattern of cirrhotic liver and chromic hepatitis[J].IEEEE Trans Biological Engineering,1998;45(3):396-400.
    [29]王炳,相敬林.基于神经网络方法的人体脉象识别研究[J]-西北工业大学学报,2002,20(3):454-457.
    [30]D.Michie,D.J.Spiegelhalter,C.C.Taylor,et al.Machine Learning,Neural and Statistical Classification[M].Prentice Hall.1994.
    [31]匡萃璋.从中医传统学术发展规律中寻求规范化的途径[J].中国医药学报,1991,6(4):53-56.
    [32]郭蕾,王永炎,张俊龙,等.论证候的内实外虚[J].中国医药学报,2004;19(1):645。
    [33]麻晓蕙,楚更五.胆病证型及症状学研究[J].浙江中医杂志,2001;(1):32-33.
    [34]赖世隆,曹桂婵,梁伟雄,等.中医证候的数理统计基础及血瘀证宏观辨证计量化初探[J].中国医药学报,1988;(6):27.
    [35]申春悌,王建伟,王彩华.DME在中医证候规范研究中的运用[J].中国医药学报,1990;5(5):67.
    [36]赵玉秋,陈国林,潘其良,等.流行病学在中医肝证临床辩证标准研究中的应用[J].中医杂志,1991;32(3):49.
    [37]朴海垠,谢雁鸣.中医软指标的测量方法初探[J].中医药学刊,2006;24(6):1018-1020.
    [38]粱茂新.中医症状量化的方法初探[J].中国医药学报,1994;9(3):37.
    [39]严石林,张连文,王米渠,等.肾虚证辨证因子等级评判操作标准的研究[J].成都中医药大学学报,2001;24(1):56-59
    [40]周小青.浅析证的等级计量诊断[J].辽宁中医杂志,1992;(6):11.
    [41]吴大嵘,梁伟雄,温泽淮,等.建立中风病血瘀证宏观辨证量化诊断标准的方法探讨[J].广州中医药大学学报,1999;16(4):249-252.
    [42]欧爱华,罗翌,严夏,等.SARS与急性上呼吸道感染中医证候分型及指标数量化方法的探讨[J].中国卫生统计,2006;3(4):309-311.
    [43]方显明,苏毅强,黄国东,等.胃十二指肠疾病脾虚证计量判别分析[J].中医药通报,2002;1(1):33-35.
    [44]罗团连,陈国林,赵玉秋,等.中医肝病五类证的计量鉴别诊断及其临床评估[J].中国现代医学杂志,1999;9(4):29.31.
    [45]周慎,易振佳,刘伍立,等.逐步Bayes判别法对中风后遗症757例中医证候的计量诊断研究[J].湖南中医杂志,2004;20(6):4-6.
    [46]黄小波,李宗信,陈文强,等.慢性疲劳综合征的中医证候聚类分析[J].中华中医药杂志,2006;21(10):592-594.
    [47]温利辉,罗月中,洪钦国,等.IgA肾病中医辨证的多元分析[J].广州中医药大学学报,2006;23(4):290-294.
    [48]JEN SEN F V.An in troduction to Bayesian networks[M].New York:Springer,1996.
    [49]朱咏华.基于贝叶斯网络的中医辨证系统[J].湖南大学学报(自然科学版),2006;33(4):123-125.
    [50]邓乐巧,金艳蓉,杨海燕,等.心脏舒张功能不全中医辨证分型聚类研究[J].中国中医药信息杂志,2005;12(10):12-14.
    [51]王忠,张伯礼,申春娣,等.中医中风病证候的多元统计分析[J].中国中西医结合杂志,2003;23(2):106.109.
    [52]李波.模糊数学理论在预防医学中的应用初探[J].广西医学,1998;20(6):1193-1195.
    [53]李波,钟智.肝炎病诊断的模糊数学模型[J].广西医科大学学报,1998;15(1):52-54.
    [54]贺宪民,孟虹,王忆勤,郎庆波,范思昌.基于熵的决策树理论及其在中医证型研究中的应用[J].数理统计与管理,2004;23(5):57-62.
    [55]李海霞,孙占权,王阶.基于信息熵的血瘀证症状、体征规范化研究[J].中医杂志,2006:47(9):689-690.
    [56]戴浩,方思行.中医辅助诊断中带复合项的关联规则挖掘算法[J].暨南大学学报,2005;6(3):337-340.
    [57]瞿海斌,毛利锋,王阶.基于决策树的血瘀证诊断规则自动归纳方法[J].中国生物医学工程学报,2005;24(6):709-711.
    [58]秦中广,毛宗源,邓兆智.粗糙集在中医类风湿证候诊断中的应用[J].中国生物医学工程学报,2001;20(4):357-363.
    [59]林维鉴.BP网络用于中医痹证证候分类[J].福建中医学院学报,1997;7(4).41-43.
    [60]李建生,胡金亮,余学庆,等.基于聚类分析的径向基神经网络用于证候诊断的研究[J].中国中医基础医学杂志,2005;11(9):685-687.
    [61]Fisher CM.Lacunes:small,deep cerebral infarcts.Neurology,1965;15:774-784.
    [62]American Academy of Sleep Metticine.lntemationnal classification of sleep Disorders.2 rid:Diagnostic and coding manual.Revised.Produced bv the American Academy of Sleep Medicine.2005:58-65.
    [63]中华医学会精神科分会.中国精神障碍分类与诊断标准(CCMD-3)[s].济南:山东科学技术出版社,2001;118-119.
    [64]http://www.2lj k.com.cn/doctor_home/papercontent.asp?recordid=982.
    [65]Worid Health Organization.The ICD—10 Classification of Mental and Behavioural Disorders:Diagnostic Criteria for Research.Geneva:WOrld Health Organization..1993.
    [66]国家技术监督局.中医临床诊疗术语疾病部分GB/T 16751.1-1997[S].中国标准出版社,1997:5、64.
    [67]孙传兴.临床疾病诊断依据治愈好转标准.第2版.北京:人民军医出版社,1998:317.
    [68]中华人民共和国中医药管理局.ZY/T001.1-001.9-94.《中医病证诊断疗效标准》[S].南京:南京大学出版社.1994:19-20.
    [69]郑筱萸.《中药新药临床研究指导原则(试行)》.中国医药科技出版社,1997:186-189。
    [70]谷松.《伤寒论》失眠证治辨析[J].国医论坛,1997,12(2):4-5.
    [71]汪永清,邓德芳.试论张仲景对失眠症的辨治十法[J].中华综合临床医学杂志.2003;23(5):72-73.
    [72]武永刚,亢连茹.《伤寒杂病论》治疗失眠证七法辨析[J].中医药学报,2004;32(2):5-6。
    [73]林虹,李翔.施今墨先生治疗失眠经验浅析[J].天津中医,2000;17(5):1-2。
    [74]徐云生.邓铁涛教授治疗失眠的经验[J].新中医,2000;32(6):5-6.
    [75]钱嘉熙.刘仕昌教授治疗失眠经验[J].新中医,1995;27(9):12-13.
    [76]邓红.王多让从气血论治失眠症经验[J].实用中医药杂志,2000;16(5):37。
    [77]吕春芳,解静.吕同杰治疗顽固性失眠经验[J].山东中医杂志,2000;19(5):300-302.
    [78]毛海燕.张珍玉辨治失眠经验[J].山东中医杂志,2002;21(6):369-370.
    [79]徐凌云,高荣林.董德懋对失眠的认识和治疗[J].辽宁中医杂志,2003;30(11):873.
    [80]张钊,陈守强.相修平教授治疗顽固性失眠四法[J].河南中医,2005;25(3):27-28.
    [81]吴云华,等.熊辅信治疗失眠经验[J].辽宁中医学院学报,2006;8(1):56.
    [82]王翘楚.失眠症的中医诊断、辨证和治疗[J].中医药通报,2006;5(5):10-13.
    [83]许良.失眠症从肝论治——附1000例临床资料分析.上海中医药杂志,2001;9:16-17.
    [84]颜乾麟,颜德馨.治疗顽固性失眠的经验[J].中医杂志,1993;34(4):219-220.
    [85]姜向坤.徐明涟调肝五法治疗顽固性失眠的经验.山东中医药大学学报,2000;24(3):199-200.
    [86]王慧艳.王坤山从肝辨治失眠的经验[J].新中医,2000;32(2):5-6.
    [87]葛淑芬,李林田.顽固失眠从肝胆论治[J].上海中医杂志,1997:(7):13-14.
    [88]谢建军.从肝论治八法[J].中国中医基础医学杂志,1997;3(增刊):33-34.
    [89]杨方尧.方以正不寐从瘀辨治的思路与方法初探[J].贵阳中医学院学报,1999;21(3):9210.
    [90]李吴,陈百先.气血辨证论治失眠[J].上海铁道大学学报,2000;21(3):55.
    [91]许良.顽固性不寐从瘀论治[J].中医杂志.2001;(3):68.
    [92]王柏青.不寐症300例辨证治疗体会[J].湖南中医杂志,1994;10(4):12-13.
    [93]李军体.失眠的辨证论治[J].中国中医基础医学杂志,1997;增刊(下):68.
    [94]钱彦方.顽固性失眠辨治体会[J].中医杂志,1998;39(11):658-659.
    [95]郜中平.辨证治疗失眠症43例[J].云南中医中药杂志,1999;20(1):20-21.
    [96]宋蓓,黄育平,苗凌娜.酸枣仁汤加减治疗失眠42例[J]冲医杂志,2001;42(11):653.
    [97]龚少愚.辨证治疗顽固性不寐症60例[J].河北中医,2001;23(9):667-668.
    [98]王新.中药治疗顽固性失眠症128例[J].实用中医药杂志,2002;18(9):16.
    [99]郭雅明.中医辨证治疗顽固性失眠55例[J].河南中医,2003;23(3):17-18.
    [100]张春丽.辨证治疗失眠症50例疗效观察[J].山西中医,2005;21(4):21-22.
    [101]李小波,等.中医辨证治疗失眠症60例[J].陕西中医,2006;27(5):543-544.
    [102]熊禄,朴美兰.治疗失眠症用药规律探讨[J].湖南中医药导报,1999;5(2):7-8.
    [103]施明,等.失眠临床辨证论治探讨[J].上海中医药杂志,2003:37(3):18-20.
    [104]付乙,等.运动性失眠的中医辨证与问卷分析[J].四川中医,2001;19(10):12-14。
    [105]石幼琪,等.运动性失眠的流行病学调查和中医分型[J].中国运动医学杂志,2004;23(3):320-322.
    [106]石幼琪,等.劳倦伤气型运动性失眠辨证标准探析[J].中国运动医学杂志,2004;23(6):705-707.
    [107]石幼琪,等.运动性失眠心脾两虚型中医辨证标准的分析研究[J].中国医师杂志,2003;5(10):1326-1327.
    [108]石幼琪,等.运动性失眠阴虚火旺型中医辨证标准的调查分析[J].中国临床康复;2004;8(3):490-491.
    [109]李峰.台湾地区失眠症的症状与证候特证的研究.北京中医药大学硕士毕业论文.2005:46-48.
    [110]元启祥.失眠的病因病机及辨证分型的研究[D].天津中医学院硕士毕业论文.2005:27-29.
    [111]刘凯军.失眠的流行病学及心胆气虚失眠的研究.中国中医研究院硕士毕业论文.1999:14-19
    [112]Skowron A,CRauszer.The Discernibility Matrix and functions in Information Systems.Handbook of Applications and Advances of the Rough Set Theory.Kluwer Academic Publishers,1992:331-362.
    [113]T.Hastie,R.Tibshirani,and J.Friedman,The elements of statistical learning:Data mining,inference,and prediction.New York:Springer,2001.
    [114]http://www.cs.waikato.ac.nz/ml/weka/
    [115]刘叔军,盖晓华,樊京,等.MATLAB7.0控制系统应用与实例[M].北京:机械工业出版杜,2005:310.
    [116]李建生,胡金亮,余学庆,等.基于聚类分析的径向基神经网络用于证候诊断的研究[J].中国中医基础医学杂志;2005;11(9):685-687.
    [117]胡镜清,刘保延。王永炎.中医临床个体化诊疗信患特征与数据挖掘技术应用分析[J].世界科学技术·中医药现代化,2004:6(1):14-16.
    [118]D.Romano,Data Mining Leading Edge:Insurance & Banking,In Proceedings of Knowledge Discovery and Data Mining,Unicorn,Brunet University,1997.
    [119]Egmont-Petersen M,Talmon J L,Hasman A,Ambergen A W,Assessing the importance of features for multi-layer perceptrons[J],Neural Networks,1998,11(4):623-635.
    [120]温利辉,罗月中,洪钦国,等.IgA肾病中医辨证的多元分析[J].广州中医药大学学报,2006;23(4):290-294.
    [121]李先涛,赖世隆,梁伟雄,等.建立急性缺血性中风气虚血瘀证诊断标准的方法学探讨[J].广州中医药大学学报,2000;17(3):218-221.
    [122]王国胤,于洪,杨大春,基于条件信息熵的决策表约简[J].计算机学报,2002;25(7):759-766.
    [123]A.Skowron,C.Rauszer,The discernibility matrices and functions in information system,in:R.Slowinski(Ed.),Intelligent Decision Support:Handbook of Applications and Advances of Rough Sets Theory,Kluwer Academic Publisher,Dordrecht,1992:331-362.
    [124]王钰,王任,苗夺谦,等.基于Rough Set理论的“数据浓缩”[J].计算机学报,1998;21(5):393-400.
    [125]T.G.Dietterich.Machine Leaming Research:Four Current Directions[J].AI Magazine,1997;18(4):97-136.
    [126]ShiXin Yu.Feature Selection and Classier Ensembles:A Study on Hyperspectral Remote Sensing Data(2003)Available at http://143.129.203.3/visielab/theses/shixin/thesis _yu.pdf
    [127]赖世隆主编.中西医结合科研方法学[M].北京:科学出版社,2003:140-154.
    [128]余松林主编.医学统计学[M].人民卫生出版社,2002:164-178.
    [129]Hanley JA,Mc Neil BJ.A method of comparing the areas under a receiver operating characteristic curves derived from the same case[J].Radiology,1983;148:839-843.
    [130]Frank Freese.Testing accuracy[J].For.Sci.,1960;(2):139- 145.
    [131]MarionR,Reynolds JR.Estimating the error in model predictions[J].Forensic Science,1984;(3):454,469.
    [132]Timothy G Gregorie,Marion R Reynolds.Accuracy testing and estimation alternatives[J].For Sci,1988;(2):302-320.
    [133]周志华,曹存根主编.神经网络及其应用.北京:清华大学出版社.2004:28-29。
    [134]Mttller K R,Mika S,Ratsch G,et al.An introduction to kemebhased learning algorithms[j].IEEE Transactions on Neural Networks,2001;12(2):181-202.
    [135]Keerthi,S.S.,C.-J.Lin.Asymptotic behaviors of supportvector machines with Gaussian kernel[J].Neural Computation.2003;15(7):1667-1689.
    [136]Cortes,C,V.Vapnik.Support-vector network[J].MachineLearning.1995;20:273-297.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700