基于LASSO的FDR控制方法及其在高维数据生存分析中的应用
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:LASSO-based Methods with the False Discovery Rate Control and the Application in Survival Analysis of High-dimensional Data
  • 作者:许树红 ; 董晓强 ; 陶然 ; 高雪 ; 高倩 ; 虞明星 ; 王彤
  • 英文作者:Xu Shuhong;Dong Xiaoqiang;Tao Ran;Department of Health Statistics,School of Public Health,Shanxi Medical University;
  • 关键词:LASSO ; 生存分析 ; 调整参数 ; 错误发现率
  • 英文关键词:LASSO;;Survival analysis;;Tuning parameter;;False discovery rate
  • 中文刊名:ZGWT
  • 英文刊名:Chinese Journal of Health Statistics
  • 机构:山西医科大学卫生统计教研室;绍兴市疾病预防控制中心;
  • 出版日期:2018-06-25
  • 出版单位:中国卫生统计
  • 年:2018
  • 期:v.35
  • 基金:国家自然科学基金项目(81473073)
  • 语种:中文;
  • 页:ZGWT201803001
  • 页数:8
  • CN:03
  • ISSN:21-1153/R
  • 分类号:4-11
摘要
目的基于LASSO-Cox模型探索交叉验证(cross validation)、pcvl法(penalized cross-validated log-likelihood)、EBIC准则(extended bayesian information criterion)、平稳选择(stability selection)四种方法在控制FDR(false discovery rate)方面的表现及其变量选择效果。方法通过模拟研究评价各方法在不同删失比例、自变量间不同相关程度以及回归系数的不同稀疏水平下的FDR和PSR(positive select rate),并从GEO上下载DLBCL数据进行基因与预后间的关联分析。结果模拟结果表明,在不同删失比例、自变量相关程度和稀疏水平的情况下,平稳选择法控制FDR的能力都优于其他方法且其变量选择效能也较高。EBIC准则在相关程度低、自变量较稀疏时表现较好,当样本量较小时结果较保守。pcvl法虽然不容易漏掉有效应的变量,但其FDR仍较高。实例结果显示,EBIC准则只选出1个基因,平稳选择法选出的基因中大部分有统计学意义且与其他方法的结果重合度高。结论在基于LASSO-Cox模型的高维数据生存分析中平稳选择法能较好地控制FDR且其变量选择效能也较高。
        Objective To explore the performance of CV method( cross validation),pcvl method( penalized cross-validated log-likelihood),EBIC criterion( extended bayesian information criterion) and stability selection approach in the aspect of controlling FDR( false discovery rate) and the effect of variable selection based on LASSO-Cox model.Methods Based on the simulation study,we evaluate the influence of the censoring proportion of survival data,the different linear correlations between covariates and the different sparse scenarios on the FDR and positive select rate( PSR) of each method respectively.We used a data set from GEO to identify prognostic genes in the real data analysis.Results The simulation results show that in the case of different censoring proportion,the correlation coefficients and the sparse scenarios,the stability selection's ability to control the FDR is better and more stable than other methods,simultaneously,and its power is relatively high.The EBIC performs well when the correlation coefficients and the sparse scenarios are low,however,the EBIC performs conservative when the sample size is less.Although the pcvl method is not easy to miss important variables,but the FDR is still relatively high.The real data analysis results show that only one gene was identified by the EBIC.Most of the genes identified by the stability selection method were statistically significant and were highly consistent with the results of other methods.Conclusion Based on LASSO-Cox model,the stability selection's ability to control the FDR is better,and its power is relatively high in the survival analysis of high-dimensional data.
引文
[1]Tibshirani R.Regression Shrinkage and Selection via the Lasso.J of the Royal Statistical Society,1996,58(1):267-288.
    [2]Benner A,Zucknick M,Hielscher T,et al.High-dimensional Cox models:the choice of penalty as part of the model building process.Biometrical Journal,2010,52(1):50-69.
    [3]Shojaie A,Michailidis G.Penalized likelihood methods for estimation of sparse high-dimensional directed acyclic graphs.Biometrika,2010,97(3):519-538.
    [4]许树红,王慧,孙红卫,等.基于LASSO类方法的Ⅰ类错误的控制.中国卫生统计,2017,34(4):660-666.
    [5]Benjamini Y,Hochberg Y.Controlling the False Discovery Rate:APractical and Powerful Approach to Multiple Testing.Journal of the Royal Statistical Society,1995,57(1):289-300.
    [6]Tvernes N,Rotolo F,Michiels S.Empirical extensions of the lasso penalty to reduce the false discovery rate in high-dimensional Cox regression models.Statistics in Medicine,2016,35(15):2561-2573.
    [7]Chen J,Chen Z.Extended Bayesian information criteria for model selection with large model spaces.Biometrika,2008,95(95):759-771.
    [8]Luo S,Chen Z.Extended BIC for linear regression models with diverging number of relevant features and high or ultra-high feature spaces.Journal of Statistical Planning&Inference,2013,143(3):494-504.
    [9]Meinshausen N,Bühlmann P.Stability Selection.Journal of the Royal Statistical Society,2010,72(4):417-473.
    [10]勾建伟.惩罚回归方法的研究及其在后全基因关联研究中的应用.江苏南京:南京医科大学,2014.
    [11]赵俊琴.基于Lasso的高维数据线性回归模型统计推断方法比较.山西太原:山西医科大学,2015.
    [12]Hofner B,Boccuto L,G9ker M.Controlling false discoveries in highdimensional situations:boosting with stability selection.BMC Bioinformatics,2015,16(1):1-17.
    [13]Song Q,Liang F.High-Dimensional Variable Selection With Reciprocal L1-Regularization.Journal of the American Statistical Association,2015,110(512):1607-1620.
    [14]Luo S,Xu J,Chen Z.Extended Bayesian information criterion in the Cox model with a high-dimensional feature space.Annals of the Institute of Statistical Mathematics,2015,67(2):287-311.
    [15]Fan Y,Tang CY.Tuning parameter selection in high dimensional penalized likelihood.Journal of the Royal Statistical Society,2013,75(3):531-552.
    [16]Guo Y,Hastie T,Tibshirani R.Regularized linear discriminant analysis and its application in microarrays.Biostatistics,2007,8(1):86-100.
    [17]Kim SJ,Sohn I,Do IG,et al.Gene expression profiles for the prediction of progression-free survival in diffuse large B cell lymphoma:results of a DASL assay.Ann Hematol,2014,93(3):437-447.
    [18]Zha X,Yin Q,Tan H,et al.Alteration of the gene expression profile of T-cell receptor alphabeta-modified T-cells with diffuse large Bcell lymphoma specificity.Hematology,2013,18(3):138-143.
    [19]Ding H,Jin X,Ding N,et al.Single nucleotide polymorphisms of CD20gene and their relationship with clinical efficacy of R-CHOP in patients with diffuse large B cell lymphoma.Cancer Cell Int,2013,13:58.
    [20]Kim IW,Han N,Kim MG,et al.Copy number variability analysis of pharmacogenes in patients with lymphoma,leukemia,hepatocellular,and lung carcinoma using The Cancer Genome Atlas data.Pharmacogenet Genomics,2015,25(1):1-7.
    [21]Gentry M,Bodo J,Durkin L,et al.Performance of a Commercially Available MAL Antibody in the Diagnosis of Primary Mediastinal Large B-Cell Lymphoma.Am J Surg Pathol,2017,41(2):189-194.
    [22]Marchesi F,Cirillo M,Bianchi A,et al.High density of CD68+/CD163+tumour-associated macrophages(M2-TAM)at diagnosis is significantly correlated to unfavorable prognostic factors and to poor clinical outcomes in patients with diffuse large B-cell lymphoma.Hematol Oncol,2015,33(2):110-112.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700