摘要
基因检测技术运用至今已积累大量来自不同平台的数据,针对传统数据分类模式难以在不同平台间进行有效迁移的问题,提出一种基于层级规则树的基因表达数据分类算法k-HRT。设计数据转换与规则预筛选策略,实现算法的快速挖掘,以解决由跨平台特性所带来的大规模数据问题。在真实基因表达数据集上的实验结果表明,相对k-TSP算法、SVM-RFE算法,k-HRT算法能够有效提高分类精度。
The application of genetic testing technology has accumulated a large amount of data from different platforms.To address the problem that it is difficult to migrate traditional data classification modes across different platforms,this paper proposes a gene expression data classification algorithm k-HRT based on Hierarchy Rule Tree(HRT).The strategy of data conversion and rule pre-screening is designed to realize the fast mining of the algorithm to solve the large-scale data problems caused by cross-platform characteristics.Experimental results on real gene expression datasets show that,compared with k-TSP algorithm and SVM-RFE algorithm,k-HRT algorithm can effectively improve classification accuracy.
引文
[1] CLAES P,ROOSENBOOM J,WHITE J D,et al.Genome-wide mapping of global-to-local genetic effects on human facial shape[J].Nature Genetics,2018,50(3):414-420.
[2] CASPI A,SUGDEN K,MOFFITT T E,et al.Influence of life stress on depression:moderation by a polymorphism in the 5-HTT gene[J].Science,2003,301(5631):386-389.
[3] GUYON I,WESTON J,BARNHILL S,et al.Gene selection for cancer classification using support vector machines[J].Machine Learning,2002,46(1-3):389-422.
[4] LECUN Y,BENGIO Y,HINTON G.Deep learning[J].Nature,2015,521(7553):436-440.
[5] LIU Bing,HSU W,MA Yiming.Integrating classification and association rule mining[C]//Proceedings of International Conference on Knowledge Discovery and Data Mining.New York,USA:AAAI Press,1998:80-86.
[6] CONG G,TAN K L,TUNG A K,et al.Mining top-k covering rule groups for gene expression data[C]//Proceedings of 2005 ACM SIGMOD International Conference on Management of Data.New York,USA:ACM Press,2005:670-681.
[7] TAN A C,NAIMAN D Q,XU Lei,et al.Simple decision rules for classifying human cancers from gene expression profiles[J].Bioinformatics,2005,21(20):3896-3904.
[8] CAI Ruichu,HAO Zhifeng,YANG Xiaowei,et al.An efficient gene selection algorithm based on mutual information[J].Neurocomputing,2009,72(4-6):991-999.
[9] CAI Ruichu,TUNG A K,ZHANG Zhenjie,et al.What is unequal among the equals?Ranking equivalent rules from gene expression data[J].IEEE Transactions on Knowledge and Data Engineering,2011,23(11):1735-1747.
[10] 蔡瑞初,王美华,郝志峰,等.基于最大间隔的基因表达规则筛选[J].计算机工程与应用,2011,47(26):11-13.
[11] GEMAN D,D’AVIGNON C,NAIMAN D Q,et al.Classifying gene expression profiles from pairwise mRNA comparisons[J].Statistical Applications in Genetics and Molecular Biology,2004,3(1):1-19.
[12] 蔡毅,朱秀芳,孙章丽,等.半监督集成学习综述[J].计算机科学,2017,44(增刊):7-13.
[13] HAAS P J,HELLERSTEIN J M.Ripple joins for online aggregation[J].ACM SIGMOD Record,1999,28(2):287-298.
[14] BAHAR R,HARTMANN C H,RODRIGUEZ K A,et al.Increased cell-to-cell variation in gene expression in ageing mouse heart[J].Nature,2006,441(7096):1011-1012.
[15] FEDELES B I,SINGH V,DELANEY J C,et al.The AlkB family of Fe (ii)/α-Ketoglutarate-dependent dioxygenases:repairing nucleic acid alkylation damage and beyond[J].Journal of Biological Chemistry,2015,290(34):20734-20742.
[16] OUGLAND R,JONSON I,MOEN M N,et al.Role of ALKBH1 in the core transcriptional network of embryonic stem cells[J].Cellular Physiology and Biochemistry,2016,38(1):173-184.
[17] FONSECA R R D,KOSIOL C,TOMá V,et al.Positive selection on apoptosis related genes[J].Febs Letters,2010,584(3):469-476.
[18] TAN Qihua,ZHAO Jinghua,LI Shuxia,et al.Differential and correlation analyses of microarray gene expression data in the CEPH Utah families[J].Genomics,2008,92(2):94-100.
[19] GLERUM D M,SHTANKO A,TZAGOLOFF A.Charac-terization of COX17,a yeast gene involved in copper metabolism and assembly of cytochrome oxidase[J].Journal of Biological Chemistry,1996,271(24):14504-14509.