用户名: 密码: 验证码:
基于信息增益的中医体质多标记分类方法研究
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Study on Multi-label Classification Method of TCM Constitutions Based on Information Gain
  • 作者:吕庆莉
  • 英文作者:LYU Qingli;Basic Medical College, Shaanxi University of Chinese Medicine;
  • 关键词:中医体质分类 ; 信息增益 ; 多标记分类
  • 英文关键词:TCM constitutions;;information gain;;multi-label classification
  • 中文刊名:XXYY
  • 英文刊名:Chinese Journal of Information on Traditional Chinese Medicine
  • 机构:陕西中医药大学基础医学院;
  • 出版日期:2019-05-31
  • 出版单位:中国中医药信息杂志
  • 年:2019
  • 期:v.26;No.299
  • 基金:国家自然科学基金(81503195);; 陕西省教育厅重点实验室项目(16JS025);; 陕西省科技厅项目(2014k14-02-02)
  • 语种:中文;
  • 页:XXYY201906020
  • 页数:4
  • CN:06
  • ISSN:11-3519/R
  • 分类号:102-105
摘要
目的为降低中医体质传统分类方法主观性误差,兼顾兼夹体质,提出基于信息增益的中医体质多标记分类方法。方法采用多标记方法进行中医体质分类。为解决多标记分类方法中不同特征对分类标签的影响不同的问题,通过体质分类数据计算各特征项的信息增益,计算体质分类特征对分类标签的权重,进而通过加权的多标签分类器,得出体质数据多标记分类。结果与传统判别分析法相比,基于信息增益的多标记分类方法在1-错误率(16.33%)、汉明损失(15.44%)、平均准确率(82.61%)方面均有一定优势。结论基于信息增益的多标记分类方法在保证准确率同时可兼顾兼夹体质,实现对体质特征差异性及趋同性的更好描述。
        Objective To propose a multi-label classification method of TCM constitutions based on information gain; To reduce the subjective error of traditional classification methods of TCM constitutions and take into account the combination of constitutions. Methods The multi-label method was used to classify TCM constitutions. In order to solve the problem that different features of multi-label classification method had different influence on the classification label, the information gain of each feature item was calculated by the physique classification data, and the weight of classification features were calculated. Then multi-label classification of physique data was obtained by weighted multi-label classifier. Results Compared with the traditional discriminant analysis method, the multi-label classification method based on information gain had certain advantages in 1-error rate(16.33%), hamming loss(15.44%), and average accuracy(82.61%). Conclusion The multi-label classification method based on information gain can ensure the accuracy. Taking into account the combination of constitutions can realize the better description of the difference in constitution characteristics and convergence.
引文
[1]危凌云,李灿东,黄文金,等.中医体质类型分布及兼杂规律研究[J].山东中医药大学学报,2016,40(2):102-104.
    [2]彭长根,丁红发,朱义杰,等.隐私保护的信息熵模型及其度量方法[J].软件学报,2016,27(8):1891-1903.
    [3]李学明,李海瑞,薛亮,等.基于信息增益与信息熵的TFIDF算法[J].计算机工程,2012,38(8):37-40.
    [4]陈科文,张祖平,龙军.文本分类中基于熵的词权重计算方法研究[J].计算机科学与探索,2016,10(9):1299-1309.
    [5]ZHANG M L,ZHOU Z H.ML-KNN:A lazy learning approach to multilabel learning[J].Pattern Recognition,2007,40(7):2038-2048.
    [6]广凯,潘金贵.一种基于向量夹角的k近邻多标记文本分类算法[J].计算机科学,2008,35(4):205-206.
    [7]张顺,张化祥.用于多标记学习的K近邻改进算法[J].计算机应用研究,2011,28(12):4445-4446.
    [8]GUO G,WANG H,BELL D,et al.KNN model-based approach in classification[C]//OTM Confederated International Conferences on the Move To Meaningful Internet Systems.Berlin,Heidelberg:Springer,2003:986-996.
    [9]李峰,苗夺谦,张志飞,等.基于互信息的粒化特征加权多标签学习k近邻算法[J].计算机研究与发展,2017,54(5):1024-1035.
    [10]潘主强,张林,张磊,等.中医临床疾病数据多标记分类方法研究[J].计算机科学与探索,2017,12(8):1295-1304.
    [11]郝春风,王忠民.一种用于大规模文本分类的特征表示方法[J].计算机工程与应用,2007,43(15):170-172.
    [12]冯雪东.基于一对一分解的多标签分类算法研究[D].南京:南京师范大学,2013.
    [13]龚静,黄欣阳.基于隐性语义索引的多标签文本分类集成方法[J].计算机工程与设计,2017,38(9):2556-2561.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700