摘要
目的为降低中医体质传统分类方法主观性误差,兼顾兼夹体质,提出基于信息增益的中医体质多标记分类方法。方法采用多标记方法进行中医体质分类。为解决多标记分类方法中不同特征对分类标签的影响不同的问题,通过体质分类数据计算各特征项的信息增益,计算体质分类特征对分类标签的权重,进而通过加权的多标签分类器,得出体质数据多标记分类。结果与传统判别分析法相比,基于信息增益的多标记分类方法在1-错误率(16.33%)、汉明损失(15.44%)、平均准确率(82.61%)方面均有一定优势。结论基于信息增益的多标记分类方法在保证准确率同时可兼顾兼夹体质,实现对体质特征差异性及趋同性的更好描述。
Objective To propose a multi-label classification method of TCM constitutions based on information gain; To reduce the subjective error of traditional classification methods of TCM constitutions and take into account the combination of constitutions. Methods The multi-label method was used to classify TCM constitutions. In order to solve the problem that different features of multi-label classification method had different influence on the classification label, the information gain of each feature item was calculated by the physique classification data, and the weight of classification features were calculated. Then multi-label classification of physique data was obtained by weighted multi-label classifier. Results Compared with the traditional discriminant analysis method, the multi-label classification method based on information gain had certain advantages in 1-error rate(16.33%), hamming loss(15.44%), and average accuracy(82.61%). Conclusion The multi-label classification method based on information gain can ensure the accuracy. Taking into account the combination of constitutions can realize the better description of the difference in constitution characteristics and convergence.
引文
[1]危凌云,李灿东,黄文金,等.中医体质类型分布及兼杂规律研究[J].山东中医药大学学报,2016,40(2):102-104.
[2]彭长根,丁红发,朱义杰,等.隐私保护的信息熵模型及其度量方法[J].软件学报,2016,27(8):1891-1903.
[3]李学明,李海瑞,薛亮,等.基于信息增益与信息熵的TFIDF算法[J].计算机工程,2012,38(8):37-40.
[4]陈科文,张祖平,龙军.文本分类中基于熵的词权重计算方法研究[J].计算机科学与探索,2016,10(9):1299-1309.
[5]ZHANG M L,ZHOU Z H.ML-KNN:A lazy learning approach to multilabel learning[J].Pattern Recognition,2007,40(7):2038-2048.
[6]广凯,潘金贵.一种基于向量夹角的k近邻多标记文本分类算法[J].计算机科学,2008,35(4):205-206.
[7]张顺,张化祥.用于多标记学习的K近邻改进算法[J].计算机应用研究,2011,28(12):4445-4446.
[8]GUO G,WANG H,BELL D,et al.KNN model-based approach in classification[C]//OTM Confederated International Conferences on the Move To Meaningful Internet Systems.Berlin,Heidelberg:Springer,2003:986-996.
[9]李峰,苗夺谦,张志飞,等.基于互信息的粒化特征加权多标签学习k近邻算法[J].计算机研究与发展,2017,54(5):1024-1035.
[10]潘主强,张林,张磊,等.中医临床疾病数据多标记分类方法研究[J].计算机科学与探索,2017,12(8):1295-1304.
[11]郝春风,王忠民.一种用于大规模文本分类的特征表示方法[J].计算机工程与应用,2007,43(15):170-172.
[12]冯雪东.基于一对一分解的多标签分类算法研究[D].南京:南京师范大学,2013.
[13]龚静,黄欣阳.基于隐性语义索引的多标签文本分类集成方法[J].计算机工程与设计,2017,38(9):2556-2561.