基于迭代延长纠错输出编码的微阵列数据多分类方法
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Microarray Data Multiple Classification Method Based on Iterative Extension Error Correct Output Code
  • 作者:钟天云 ; 刘昆宏 ; 王备战
  • 英文作者:ZHONG Tianyun;LIU Kunhong;WANG Beizhan;Software School of Xiamen University;
  • 关键词:微阵列 ; 纠错输出编码 ; 多分类算法 ; 癌症基因 ; 数据复杂度
  • 英文关键词:microarray;;ECOC;;multi-class;;cancer gene;;data complexities
  • 中文刊名:XDZK
  • 英文刊名:Journal of Xiamen University(Natural Science)
  • 机构:厦门大学软件学院;
  • 出版日期:2018-05-28
  • 出版单位:厦门大学学报(自然科学版)
  • 年:2018
  • 期:v.57;No.264
  • 基金:国家自然科学基金(61502402,61772023);; 福建省自然科学基金(2016J01320,2015J05129)
  • 语种:中文;
  • 页:XDZK201803018
  • 页数:8
  • CN:03
  • ISSN:35-1070/N
  • 分类号:106-113
摘要
微阵列技术使快速大量检测基因成为可能,人们迫切需要利用该技术提高疾病诊断水平.因此,对微阵列数据的分析研究迅速发展,其中以数据多类分类研究尤为突出.但由于微阵列数据具有特征多、样本少的特点,使得传统统计学习方法分类效果欠佳.为了针对微阵列数据特点解决多类分类问题,提出了一种迭代延长纠错输出编码(iterative extension error correct output coding,IE-ECOC)的算法.在几个特征子集上,配合与特征相关的数据复杂度,利用一种基于二叉树的编码方法生成一个列池,并提出一种择列策略构造编码矩阵;然后,依据迭代验证结果延长矩阵.对癌症基因微阵列进行分类实验,结果显示,IE-ECOC对特征多、样本少的数据具有针对性,且与一些经典的ECOC算法相比,可以产生较好的结果,IE-ECOE算法效果也在实验中得到了验证.
        Microarray technology makes it possible to quickly detect numerous genes,and it is urgent to use this technique to improve the diagnostic level of diseases.Therefore,researches of microarray data analysis has developed rapidly,and the multiclass classification is particularly important.However,the "large feature size and small sample size" problem continues to retard the traditional statistical classification method.To solve the problem,we proposes an iterative extended error correcting output coding algorithm(IE-ECOC).On some feature subsets,we use a binary-tree-based coding method,which is associated with feature related data complexities,generate a column pool,and develop a selecting method to construct a coding matrix by columns in the pool.Then,according to validation results,we extend matrix iteratively.Through classification experiments of cancer gene microarray data,results show that IE-ECOC is pertinent to "large feature size and small sample size" data.Compared with some classical ECOC algorithms,the IEECOC algorithm can produce better results,and its efficiency of the extend algorithm is also experimentally verified.
引文
[1]MONTI S,TAMAYO P,MESIROV J,et al.Consensus clustering:a resampling-based method for class discovery and visualization of gene expression microarray data[J].Machine Learning,2003,52(1/2):91-118.
    [2]PENG Y.A novel ensemble machine learning for robust microarray data classification[J].Computers in Biology and Medicine,2006,36(6):553-573.
    [3]DIETTERICH T G,BAKIRI G.Solving multiclass learning problems via error-correcting output codes[J].Journal of Artificial Intelligence Research,1995,2(2):263-286.
    [4]TAPIA E,SERRA E,GONZALEZ J C.Recursive ECOC for microarray data classification[C]∥International Workshop on Multiple Classifier Systems.Berlin Heidelberg:Springer,2005:108-117.
    [5]LIU K H,ZENG ZH,NG V T Y.A hierarchical ensemble of ECOC for cancer classification based on multi-class microarray data[J].Information Sciences,2016,349:102-118.
    [6]WANG H R,LI K S,LIU K H.A genetic programming based ECOC algorithm for microarray data classification[C]∥International Conterence on Neural Information Processing.Cham:Springer,2017:683-691.
    [7]PUJOL O,RADEVA P,VITRIA J.Discriminant ECOC:a heuristic method for application dependent design of error correcting output codes[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2006,28(6):1007-1012.
    [8]CANO J.Analysis of data complexity measures for classification[J].Expert Systems with Applications,2013,40(12):4820-4831.
    [9]LORENA A C,COSTA I G,SPOLAR N,et al.Analysis of complexity indices for classification problems:cancer gene expression data[J].Neurocomputing,2012,75(1):33-42.
    [10]SCIKIT-LEARN.sklearn.multiclass[EB/OL].[2017-11-01].http:∥scikit-learn.org/stable/modules/classes.html#module-sklearn.multiclass.
    [11]ESCALERA S,PUJOL O,RADEVA P.Errorcorrecting output codes library[EB/OL].[2017-11-01].http:∥jmlr.csail.mit.edu/papers/v11/escalera10a.html.
    [12]ESCALERA S,PUJOL O,RADEVA P.On the decoding process in ternary error-correcting output codes[J].IEEE Transactions on Pattern Analysis and Machine Intelligence,2010,32(1):120-134.
    [13]KOBOLDT D,FULTON R,MCLELLAN M,et al.Comprehensive molecular portraits of human breast tumours[J].Nature,2012,490(7418):61-70.
    [14]SU A,WELSH J,SAPINOSO L,et al.Molecular classification of human carcinomas by use of gene expression signatures[J].Cancer Research,2001,61(20):7388-7393.
    [15]SHIPP M A,ROSS K N,TAMAYO P.Diffuse large B-cell lymphoma outcome prediction by gene-expression profiling and supervised machine learning[J].Nature Medicine,2002,8(1):68-74.
    [16]BEN-DOR A,BRUHN L,FRIEDMAN N,et al.Tissue classification with gene expression profiles[J].Journal of Computational Biology,2000,7(3/4):559-583.
    [17]HONG Z,YANG J.Optimal discriminant plane for a small number of samples and design method of classifier on the plane[J].Pattern Recognition,1991,24(4):317-324.
    [18]KHAN J,WEI J,RINGNR M,et al.Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks[J].Nature Medicine,2001,7(6):673-679.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700