摘要
用SIS方法对36位白血病患者中7 126个基因的高维数据进行降维,结合Lasso变量选择方法选出可能的致病基因。根据响应变量的数据类型建立了广义线性模型(Logistic模型)。通过比较AIC&BIC准则以及CV交叉验证方法下的拟合概率图得出最优模型。
With SIS method,the dimension of 7 126 genes data from 36 leukemiapatients is decreased,and then the possible pathogenic genes are selected by means of Lasso variables.Based on data type of the variables,ageneralized linear model(Logistic model)is established.The optimal model for fitting probability graph is obtained,by comparing the AIC & BIC criterion with Cross Validation(CV)verfification.
引文
[1]刘卓.高维数据分析中的降维方法研究[D].长沙:中国人民解放军国防科学技术大学,2002.
[2]李玲玲.高维线性模型的变量选择[D].南宁:广西师范大学,2007.
[3]乔治·H.邓特曼.广义线性模型[M].上海:上海人民出版社,2011.
[4]Fan J,Lv J.Sure independence screening for ultrahigh dimensional feature space[J].J.R.Stat.Soc.Ser.B,2008,70:849-911.
[5]Golub T R,Slonim D K,Tamayo P,et al.Molecular classification of cancer:class discovery and class prediction by gene expression monitoring[J].Science,1999,5439(286):531-537.
[6]Tibshirani R.Regression shrinkage and selection via the Lasso[J].Journal of the Royal Statistical Society,2011,73(3):267-288.
[7]崔静.广义线性模型下罚估计量的性质[D].西安:西北大学,2011.
[8]Feng Y,Yu Y.Consistent cross-validation for tuning parameter selection in high-dimensional variable selection[EB/OL].[2017-06-11].http://www.statslab.cam.ac.uk/~yy366/index_files/1308.5390v1.pdf.
[9]Saldana D,Feng Y.SIS:An R rackage for sure independence screening in ultrahigh dimensional statistical models[EB/OL].[2017-06-11].http://www.stat.columbia.edu/~yangfeng/pubs/jss1375.pdf.
[10]陈胜利,覃家君.基于logistic增长模型的企业集团生存关系分析[J].长春工业大学学报:自然科学版,2005,26(1):54-58.