NER通路多个SNP对肺癌易感性的交互作用研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
研究背景
     肺癌是当前全球受累人数最多的恶性肿瘤之一,人群流行病学及基础病因学研究已经证实肺癌发生是遗传和环境因素的共同作用。在肺癌发生过程中,人类的防疫机制起着重要保护作用。目前已知人体内至少有130种DNA修复基因。这些DNA修复基因的多态性现象可能通过改变DNA修复能力,从而增加个体患肺癌的风险。NER修复通路就是其中一个重要的DNA修复系统。目前关于NER通路基因多态性与肺癌易感性关系的研究非常热门。然而由于生物学方面的因素,以及研究设计和统计分析方面的因素影响,即使是同类问题的关联性研究,其结果却很不一致。其中,最主要原因是由于样本量所限,不可能对各种可能的关联性分析到位,都是在某种角度上进行分析。目前,分子流行病学界比较关注的是基因与基因对患病的交互作用以及基因与环境对患病的交互作用。用于多基因联合分析的方法除了传统的多元Logistic回归模型外,还有报道多因子降维法(Multifactordimensionality reduction,MDR)、分类回归树(Classification and regression tree,CART)等数据挖掘方法,各种分析方法都存在一定的优势和缺陷,对其结果和作用都存在许多值得讨论和商榷的地方。关联规则分析被认为是可以从大量数据中筛选新颖、潜在未知的知识和信息的一种有效工具,可以为发现各种属性包括属性组合之间的复杂关联提供许多有益的信息,因此我们考虑利用关联规则对于样本量较大的基因多态数据进行筛选,找出比较理想的信息,为进一步分析中的Logistic回归模型提供有效的待选协变量(基因)。研究目的
     研究NER通路基因单核苷酸多态性(single nucleotide polymorphism,SNP)与肺癌易感性的关系,寻找多个SNP对肺癌的交互作用,以及合适的筛选和分析方法。研究方法
     基于NER通路SNP数据的特点,拟制定关联规则筛选准则并联合Bootstrap技术,筛选出那些包含肺癌患病可能的交互作用信息的SNP组合,然后用Logistic回归模型做确认性质的回归分析和检验。为了初步证实上述分析方法的有效性,采用小规模的随机模拟方法,设定随机模拟模型的参数,产生与实际资料性质基本相似的模拟数据;采用上述筛选和分析策略以及对应的统计方法对模拟数据进行分析,比较模拟数据分析结果与模拟模型设定的参数的差异,由此验证上述方法可行性以及其他相关方法(MDR)的适用情况。其中SNP数据模拟是根据特定的生物学背景和随机模拟模型,设置模型参数,通过MATLAB7.0软件编程实现,对应的病例对照疾病状态模拟数据根据模拟模型通过SAS9.13软件编程完成。
     关联规则挖掘采用经典的Apriori算法,选择Lift、Fisher's确切概率P值、支持度和可信度作为关联规则的客观评价指标,通过这些指标的不同取值制定规则筛选准则,根据各个准则应用于模拟数据的结果对各准则进行评价,从中选择一种最有效的筛选准则用于实际资料分析。关联规则分析的过程采用SAS9.13软件编程实现。
     评价指标:①筛选准则的评价指标:100个模拟样本的规则集合中包含模拟模型预定变量及交互项的平均频数(Mean of Frequency,MF)、标准误(Standard Error,SE)、MF的95%可信区间(Confidence Interval,CI),以及筛选出的规则总数。②模型评价指标:Logistic回归模型参数估计的偏倚(Bias)、偏倚程度(Degree ofBias,DB)、95%可信区间的覆盖率(Coverage)。
     研究结果
     对初步的模拟数据分析发现:关联规则分析确能发现大量数据中各变量之间的可能潜在关联,包括变量间的交互作用。以Lift和Fisher'P值作为关联规则客观评价指标,结合Bootstrap抽样技术制定的规则筛选准则,确实能够有效地筛选出包含模拟模型中预定变量的规则。为了保证规则挖掘的成功率,应当将关联规则参数最小支持度(min_sup)和最小可信度(min_conf)设置的比较低。Bootstrap抽样技术的应用使得关联规则结果更稳定可靠。MDR方法的模拟数据结果提示:应用MDR方法不能得到真正意义上的交互作用,应避免误用。
     通过对实际资料分析,寻找到与肺癌易感性相关的两个基因多态位点XPG-rs732321和DDB2-rs830083,以及两个交互作用项ERCC1-rs3212930×ERCC1-rs3212951和ERCC2-rs13181×XPG-rs873601。XPG-rs732321的突变基因型(CC+AC)为肺癌的保护基因型(OR=0.54,95%CI=0.35~0.85)。基因位点DDB2-rs830083的突变基因型(GG+CG)为肺癌的危险基因型(OR=1.32,95%CI=1.03~1.70)。ERCC1基因rs3212930、rs3212951两个位点对于肺癌患病具有协同作用(OR=2.75,95%CI=1.1 8~6.64),同时携带这两个位点突变基因型的个体,相比仅有1个位点突变的个体,具有更高的肺癌患病风险。ERCC2-rs13181和XPG-rs873601两个基因位点间也存在交互作用(OR=2.43,95%CI=1.09~5.44),这两个位点同时突变的个体,相比仅有其中1个位点突变个体,以及两个位点都不突变的个体,都具有更高的肺癌患病风险。
     结论
     以关联规则客观评价指标支持度、可信度、Lift和Fisher'P值制定的规则筛选准则,联合Bootstrap抽样技术,确实能够有效地筛选到有价值的关联规则,发现数据中各变量之间的潜在关联,包括变量间的交互作用。以筛选到的SNP和SNP组合作为建立疾病相关的多因素Logistic回归模型的待选协变量,在Power较大的情况下,可以有效地找到与疾病相关的SNP和SNP间的交互作用。将本研究方法应用于NER通路SNP实际数据的分析,找到了与肺癌易感性相关的两个SNP位点和两个交互作用项。这些阳性结果均可以从生物遗传学角度得到合理解释。
Background
     Lung cancer is one of the most serious types of malignant tumors,with a high incidence andmortality rates.Based on epidemiological and population studies,it was confirmed that theetiology of lung cancer is involved with genetic and environmental factors.In the process of lungcancer,the mechanism of human disease prevention plays an important role in protection.It wasknown that the human body has at least 130 kinds of DNA repair genes.These DNA repairgenes polymorphisms may through change the DNA repair capacity,thereby increasing the riskof individuals suffering from lung cancer.NER repair pathway is one of the important DNArepair pathway.Nowadays,the topic of the relationship between NER pathway genepolymorphisms and susceptibility to lung cancer becomes very hot.However,results fromsimilar studies maybe are very inconsistent.The main reason is the limited sample size,it isimpossible to analyze all possible relationship rather than part of it.At present,the molecularepidemiology studies are more concerned about the interactions between genes and genes,andinteractions between genes and environment.In addition to the traditional multiple logisticregression model can be used to analysis multiple SNPs,there are reports multi-factorialdimensionality reduction method (MDR),classification and regression tree (CART) and otherdata mining methods.All of these methods have their own advantages and limitations.There aremany questions worthy of discussion for these methods results and effects.Association rulemining is considered an effective tool in screening novel or unknown knowledge andinformation from a large amount of data,so it can be used to find valuable information aboutvarious relationships between attributes in a large number of SNPs data.This information isuseful to select candidate covariates (genes) into the following Logistic regression model.
     Objective
     This objective is to study the relationship between NER pathway gene polymorphisms andsusceptibility to lung cancer,to find interactions between SNPs related with lung cancersusceptibility,and find the helpful means or method applied in SNPs and disease susceptibilityrelationship analysis.
     Methods
     Based on the actual SNP dataset,we used the association rules mining combined Bootstrapmethod to find the association rules between SNPs and lung cancer.To confirm association rulesfindings we made the Logistic regression model based on these rules including candidatecovariates (genes) and interactions information.To preliminary prove our method correct,wecarried out a small scale simulation study,through simulate random model and set modelparameters of a special biological context same with the SNP data.We analyzed the simulationdata by above method and compared the results with other methods.Independent variablessimulation data are generated by MATLAB7.0 software programming based on simulationbiological context.Dependent variable (disease state) simulation data are generated by SASsoftware programming based on the simulation model.
     The classical Apriori algorithm was used in mining association rules,implemented bySAS9.13 software.We selected the following rule interestingness measurement index:Lift,Fisher's exact probability,support and confidence.By changing the index values we chosen amost effective criteria to screen association rules from actual data analysis.
     Methods evaluation index:(1) the selection rules criteria evaluation index:the averagefrequency (MF),standard error (SE),95% confidence interval(CI),and the total number of rulesof the variables and interactions scheduled in simulation model including in the outcome rules.(2) model evaluation index:Logistic regression model parameter estimation bias (Bias),thedegree of bias (DB),95% confidence interval coverage (Coverage).
     Results
     Through the small scale simulation study,we found that association rule mining is indeed auseful tool to find the potential association between variables in a large amount of data,includinginteractions between variables.Fisher's exact probability and lift as rules interestingnessmeasurement index,combined with Bootstrap sampling technique,is indeed able to effectivelyselect rules that include variables in the simulation model.In order to ensure the success rate ofmining,the parameters minimum support (min_sup) and minimum confidence (min_conf)should be set relatively low level.The application of Bootstrap technique in association rulemining is beneficial for getting robust results.Both the simulation study results and methodanalysis of MDR confirmed that the interactions found by MDR are not credible.
     The actual data analysis results showed that the following SNPs and interactions related withlung cancer susceptibility:XPG-rs732321,DDB2-rs830083,ERCCl-rs3212930×ERCC1- rs3212951 and ERCC2-rs13181×XPG-rs873601.XPG-rs732321 (CC + AC) is the protectiongenotype for lung cancer (OR= 0.54,95% CI = 0.35~0.85).DDB2-rs830083 (GG + CG) willincrease the risk of lung cancer (OR=1.32,95% CI=1.03~1.70).ERCCl-rs3212930 and ERCC1-rs3212951 have synergistic effect of lung cancer risk (OR=2.75,95% CI = 1.18~6.64).Individual with the two mutation loci,compared with individual carrying one of the twomutation site,has a higher risk of lung cancer.The interaction between ERCC2-rs13181 andXPG-rs873601 (OR = 2.43,95% CI = 1.09~5.44) exists..Individual with the two mutation sites,compared with that carrying only one mutation site,or none of the two sites mutation,has ahigher risk of lung cancer.
     Conclusion
     Association rule mining is useful to find the potential association including interactionsbetween variables in data,through rules measurement index:support,confidence,lift and Fisher'exact probability,and Bootstrap technique.The SNPs and SNPs alliances included in rules canbe used as candidate covariates (genes) and interactions into multi-logistic regression model ofdisease and SNPs.If the power is large enough,our method is indeed able to find the SNPs andinteractions related with lung cancer.In this research,we found two lung cancer susceptibilitySNPs and two interactions.All of these positive finds can be explanted reasonable frombiological perspective.
引文
[1] D. Max Parkin, F.B., J. Ferlay and Paola Pisani, Global Cancer Statistics, 2002. CA: a cancer jounal for clinicians, 2005. 55: p. 74-108.
    [2] K.E. Warner, J.M., The global tobacco disease pandemic: Nature, causes, and cures Global Public Health, 2006. 1(1744-1692 (Print) 1744-1706 (Online)): p.65-86.
    [3] Yang L, Parkin DM, Li L, et al.Time trends in cancer mortality in China: 1987-1999.Int J Cancer. 2003; 106: 771-83.
    [4] Parkin DM, Pisani P, Lopez AD, et al. At least one in seven cases of cancer is caused by smoking. Global estimates for 1985.1nt J Cancer.1994; 59: 494-504.
    [5] Hecht SS. Approaches to chemoprevention of lung cancer based on carcinogens in tobacco smoke. Environ HealthPerspect.1997; 105(Suppl )4:955-63.
    [6] Hecht SS. Biochemistry, biology, and carcinogenicity of tobacco-specific N-nitrosamines.Chem Res Toxicol.1998; 11: 559-603.
    [7] Greenblatt MS, Bennet WP, Hollstein M, et al. Mutations in the p53 tumor suppressor gene:clues to cancer etiology and molecular pathogenesis. Cancer Res 1994 ;54 :4855-78.
    [8] Ronai ZA, Gradia S, Peterson LA, et al. G to A transitions and G to T transversions in codon 12 of the Ki-ras oncogene isolated from mouse lung tumors induced by 4-(methylnitrosamino)-1-(3-pyridy1)-1-butanone(NNK) and related DNA methylating and pyridyloxobutylating agents. Carcinogenesis,1993;14: 2419-22.
    [9] Wood RD,Mitchell M, Sgouros J. et al. Human DNA repair genes. Science.2001;291:1284-9.
    [10] Sancar A,DNA repair in humans.Annu Rev Genet,1995;29:69-105.
    [11] EGP. NIEHS SNPs [cited; Available from: http://egp.gs.washington.edu/].
    [12] Hoeijmakers JH. Genome maintenance mechanisms for preventing cancer.Nature 2001;411:366-74.
    [13] Friedberg EC.How nucleotide excision repair protects against cancer.Nature Rev Cancer 2001;l:22-33.
    [14] Cleaver JE, Cancer in xeroderma pigmentosum and related disorders of DNA repair.Nat Rev Cancer.2005;5:564-73.
    [15] O'Donovan A, Davies AA, Moggs JG,et al. XPG endonuclease makes the 3'incision in human DNA nucleotide excision repair.Nature 1994;371:432-5.
    [16]Sijbers AM,de Laat WL,Ariza RR,et al.Xeroderma pigmentosum group F caused by a defect in a structure-specific DNA repair endonuclease.Cell 1996;86:811-22.
    [17]Tian M,Shinkura R,Shinkura N,et al.Growth retardation,early death,and DNA repair defects in mice deficient for the nucleotide excision repair enzyme XPF.Mol Cell Biol.2004;24:1200-5.
    [18]WeiQ,ChengL,HongWK,etal.Reduced DNA repair capacity in lung cancer patients[J].Cancer Res,1996,56:4103-4107.
    [19]Ellen LG,Corn eliaMU,John DP.Polymorphisms in DNA repair Genes and associations with cancer risk[J].Cancer Epidemiol Biomarkers Prey,2002,11:1513-1530.
    [20]Thomas L,生物信息学:从基因组到药物,化学工业出版社,2006:321-322..
    [21]XingD,TanW,WeiQ,etal.polymorphisms of the DNA repair gene XPD and risk of lung cancer in a Chinese Population[J]Lung Cancer,2002,38:123-129.
    [22]LiangG,XingD,MiaoX,etal.Sequence variations in the DNA repair gene XPD and risk of lung cancer in a Chinese Population[J].Int J Cancer,2003,105:669-673.
    [23]Dorota B,Marek R,Lindsey E,et al.Genetic polymorphism in DNA repair genes and risk of lung Cancer Carcinogenesis,2001,22:593-597.
    [24]Hu Z,Wei Q,Wang X,Shen H,DNA repair gene XPD polymorphism and lung cancer risk:a meta-analysis.Lung Cancer,2004,46:1 - 10.
    [25]Benhamuo S,Sarasin A.ERCC2/XPD polymorphisms and cancer risk.Mutagenesis,2002,17(6):463-469.
    [26]Zienolddiny S,Campa D,Lind H,et al.Polymorphisms of DNA repair genes and risk of non-small cell lung cancer.Carcinogenesis 2006,27(3):560-567.
    [27]沈洪兵,胡志斌,马红霞等.核苷酸切除修复基因多态性与肺癌遗传易感性的关系:一个以通路为基础的大样本多中心研究,第三届全国中青年流行病学工作者学术会议论文集,2005:174-180。
    [28]Wu X,Gu J,Grossman HB,et al.Bladder cancer predisposition:a multigenic approach to DNA-repair and cell-cycle-control genes[J].Am J Hum Genet,2006,78(3):464-479.
    [29]Gu DF,Su SY,Ge DL,et al.Association Study With 33 Single-Nucleotide Polymorphisms in 11 Candidate Genes for Hypertension in Chinese.Hypertension.2006;47:1147.
    [30]唐迅,李娜,陈大方等.多因子降维法分析基因-基因交互作用的应用进展,中华流行病学 杂志,2007,28(9):918-921.
    [31]华琳,郑卫英,刘红.基于优势比的多因子降维法在SNP交互分析中的应用,中国优生与遗传杂志,2008,16(11):8-9.
    [32]赵耐青,陈峰.卫生统计学,高等教育出版社,2008.
    [33]田考聪,曾庆,王润华,等.医用多元统计分析,西南交通大学出版社,1995.
    [34]Moore JH,Ritchie MD.The challenges of whole-genome approaches to common diseases.JAMA,2004,291:1642-1643.
    [35]Richie MD,Hahn LW,Rcodi N,et al.Multifactor dimemionality reduction reveals high-order interaction among estrogen-metabolism genes in sporadic breast cancer.Am J Hum Genet,2001,69(1):138-147.
    [36]Moore JH.Computational analysis of gene-gene interactions using multifactor dimensionality reduction,Expert Rev Mol Diagn,2004,4:795-803.
    [37]Vayssieres M P,Plant R E,Allen-Diaz B H.Classification trees:an alternative non-parametric approach for predicting species distributions.Journal of Vegetation Science,2000,11:679-694.
    [38]Rouget M,Richardson DM,Lavorel S,et al.Determinants of distribution of six Pinus species in Catalonia,Spain.Journal of Vegetation Science,2001,12:491-502.
    [39]Thuiller W,Araujo MB,Lavorel S.Generalised models versus classification tree analysis:a comparative study for predicting spatial distributions of plant species at different scales.Journal of Vegetation Science,2003,14:669-680.
    [40]Leo Breiman,Jerome Friedman,Stone C J,et al.Classification and regression trees.CRC Press,1998:4-8.
    [41]Fridman JH.Multivariate adaptive regression splines.The annals of statistics,1999,19(1):1-141.
    [42]Myers JW,Kathryn BL,Levitt TS.Learning Bayesian networks from incomplete data with stochastic search algorithms.Proceedings of the 15~(th)Annual Conference on Uncertainty in Artificial Intelligence,1999,476-485.
    [43]Steve R.Gunn.Support Vector Machines for Classification and Regression.University of Southampton,1997.
    [44]吾今培,孙德山.现代数据分析.北京:机械工业出版社,2006:22-45.
    [45]陈敏雅,石蕾.基于SVM多分类决策树的研究综述.电脑知识与技术,2008:1427-1461
    [46]贾崇奇,赵仲堂,王立华,等.高血压危险因素分类树分析[J].中国公共卫生杂志,2003,19(6):684-685.
    [47]李昆仑,崔丽娟,张伟,等.基于SVM的蛋白质二级结构预测方法的研究.计算机研究与发展,2007,44(suppl.):319-322.
    [48]王家祥,王利,范应中,等.基于支持向量机的血清蛋白质指纹图谱模型在甲状腺癌诊断中的应用研究.中华医学杂志,2006,86(14):979-982.
    [49]Agrawal R,Imielienski T,Swami A.Mining Association Rules between Sets of Items in large Databases.Proceeding of the ACM SIGMOD international Conference on Management of Data,1993,(2):207-216.
    [50]Alon U,Barkai N,Notterman DA,et al.A Broad patterns of gene expression revealed,clustering analysis of tumor and normal colon tissues probed by oligonucleotide arrays.PNAS96,1999,6745-6750.
    [51]Huang X,Pan W,Park S,et al.Modeling the relationship between LVAD support time and gene expression changes in the human heart by penalized partial least squares.Journal of Chemometrics,2004,20:888-894.
    [52]Johansson D,Lindgren P,Berglund A.A multivariate approach applied to microarray data for identification of genes with cellcycle2coupled transcription.Bioinformatics,2003,19:467 -473.
    [53]Jiawei Han,Kamber M.Data Mining Concepts and techniques.San Francisco:Morg Kaufmann,2001.
    [54]史忠植.知识发现.北京:清华大学出版社,2002.
    [55]孙振球.医学统计学(第二版).北京:人民卫生出版社,2006.
    [56]武建虎,贺佳,贺宪民等.关联规则及其在肝癌病人资料分析中的应用.中国卫生统计,2005,22(4):210-213.
    [57]Becquet C,Blachon S,Jeudy B,et al.Strong-association-rule mining for large-scale gene-expression data analysis:a case study on human SAGE data.Genome Biol 2002,3(12):research0067.
    [58]Agrawal R,Srikant R.Fast Algorithms for Mining Association Rules.Knoweldge discovery and Data Mining:287-291
    [59]Han J,Pei J,Yin Y.Mining Frequent Patterns without Candidate Generation.In ACM SIGMOD Conference Management of Data,2000.
    [60]Brin S,Motwani R,Silverstein C.Beyond market baskets:Generalizing association rules to correlations.Proc ACM SIGMOD Conf(New York:ACM Press),1997a.p.265-276.
    [61]毕建欣,张歧山.关联规则挖掘算法综述.中国工程科学,2005,7(4):89-94.
    [62]伊卫国,卫金茂,王名扬,挖掘有效的关联规则.计算机工程与科学,2005,27(7):91-94.
    [63]Brijs,T,Vanhoof K,Wets G..Defining interestingness for association rules.International Journal of information theories and applications,2003,10(4):370-376.
    [64]彭斌,杨忠,李辉智等.应用关联规则挖掘构建人小脑发育的基因表达关联网络.中国卫生统计,2007,24(2):117-123.
    [65]屠康,喻辉,郭政等.GO功能类与基因差异表达的关联规则挖掘算法.生物化学与生物物理进展,2004,31(8):705-711.
    [66]Toivonen H.Sampling Large Databases for Association Rules[C].Proceedings of the 22nd International Conference on Very Large Database,Bombay,India,September 1996.
    [67]Carlson CS,E.M.,Rieder M J,Yi Q,Kruglyak L,Nickerson DA.,Selecting a maximally informative set of single-nucleotide polymorphisms for association analyses using linkage disequilibrium..Am J Hum Genet 2004.74:p.106-120.
    [68]Tsumoto S.Automated discovery of positive and negative knowledge discovery in clinical databases.IEEE Engineering in Medicine Biology,2000,19:56-62.
    [69]Xin X,Gao C,Beng CO,et al,Mining and Analysis of Gene Expression Data.Prceeding of the 30~(th)VLDB conference,Toronto,Canada,2004:1261-1264.
    [70]Oyama T,Kitano K,Satou K,et al.Extraction of knowledge on protein-peotein interaction by association rule discovery.Bioinformation,2002,18:705-711.
    [71]Zhu W,Brendel V.Gene structure identification with MyGvusing cDNA evidence and protein horm ologs to improve ab initio predictions.Bioinformation,2002,18:761-762.
    [72]Mannila H.,Toivonen H.,and Verkamo A.,Efficient algorithm for discovering association rules.AAAI Workshop on Knowledge Discovery in Databases,1994:181-192.
    [73]Han J,Kamber M.数据挖掘——概念与技术.范明,孟小峰译.北京:机械工业出版社,2001:55-88.
    [74]Zhang Z.,Lu Y.,Zhang B..An effective partitioning-combining algorithm for discovering quantitative association rules.Proceedings of PAKDD.Singapore,World Scientific Publishing Co.,1997.241-251.
    [75]Grahne G.,Lakshmanan L.,andWang X..Efficient mining of constrained coorelated sets. Proceedings of International Conference on Data Engineering,San Diego,CA,2000.512-521.
    [76]Han J,Pei J and Yin Y.Mining frequent patterns without candidate generation.In SIGMOD00,Dallas,TX,May,2000.
    [77]Park JS,Chcn MS,Yu PS.An effective hash-based algorithm for mining association Rules.In Proc.1995,ACM SIGMOD Int.Conf.Management of Data,1995pp:75-186.
    [78]Saruere A,Omiecinsky E,Navathe S.An effective hash-based algorithm for mining association rules.In 21th Int.Conf,on very large databases(VLDB),Zurich,Switzdland,1995,PP:105-112.
    [79]Mannila H,Toivonen H,Verkamo A.Efficient algorithm for discovering association rules.AAAI Workshopo nK nowledgeD iscoveryin Databases,1994.181-192.
    [80]Brin S,et al.Dynamic item set counting and implication rules for market basket data.Proceedings of the ACM SIGMOD International Conference on Management of Data,1997.123-140.
    [81]Pei J,Han J,Lu H,et al.H-Mine:Hyper-Structure Mining of Frequent Patterns in Large Databases.Proceedings of the 2001 IEEE International Conference on Data Mining,2001:441-448.
    [82]范明,李川.在FP-树中挖掘频繁模式而不生成条件FP-树,2004,40(8):22-26,
    [83]Pasquier N.,Bastide Y.,Taouil R.et al.Discovering frequent closed item sets for Association rules.Proceedings of 7~(th)International Conference on Data base Theory,January,1999.
    [84]Chakrabarti S.,Sarawagi S,and Dom B.Mining surprising patterns using temporal description length.Proceedings of International Conference on Very Large Databases(VLDB),1998:606-617.
    [85]谭学清,罗琳,周洞汝.关联规则兴趣度度量方法的比较研究.情报学报,2007,26(2):266-270.
    [86]Liu B,Hsu W,Post-analysis of learned rules.AAAI-96,1996:828-834.
    [87]Piatesky-SG,Matheus C.The interestingness of deviations.KDD-94,1994.
    [88]Silberschatz A,Tuzhilin A.What makes patterns interesting in knowledge discovery systems.IEEE Trans on Know and Data Eng.1996,8(6):970-974.
    [89]Brin S,Motwani R,Ullman JD,et al.Dynamic Itemset Counting and Implication Rules for Market Basket Analysis.Proceedings of the ACM SIGMOD Conf on Management of Data. Tucson,USA,1997:207-216.
    [90]罗可,吴杰.关联规则衡量标准的研究.控制与决策,2003(5):277-280.
    [91]Brin S,Motwani R,Silverstein C.Beyond Marker Baskets:Generalizing Association Rules to Dependence Rules.Data Mining and knowledge Discovery,1998,2(1):39-68.
    [92]武建虎.关联规则及其在肝癌病人资料分析中的应用研究.中国学位论文数据库,2005:22-30.
    [93]Andrea B,Douglas GA,Patrick R,et.al.The design of simulation studies in medical statistics[J].Statistics in Medicine 2006;25:4279-4292
    [94]方积乾,陆盈.现代医学统计学,人民卫生出版社,2002:395-397.
    [95]方积乾,孙振球.卫生统计学(第5版),人民卫生出版社,2003:345-347.
    [96]唐迅,李娜,胡永华.应用多因子降维法分析基因-基因交互作用.中华流行病学杂志,2006,27(5):437-441.
    [97]Shinsuke I,Isao K,Pierre C,et al.XPG Stabilizes TFIIH,Allowing Transactivation of Nuclear Receptors:Implications for Cockayne Syndrome in XP-G/CS Patients.Molecular Cell,2007,26(4):231-243.
    [98]Yawei Q,Margaret RS,Zhaozheng G,et al.Rapid assessment of repair of ultraviolet DNA damage with a modified host-cell reactivation assay using a luciferase reporter gene and correlation with polymorphisms of DNA repair genes in normal human lymphocytes.Mutation Research,2002,509:165-174.
    [99]Maas CJM,Hox JJ.Robustness issues in multilevel regression analysis.Statist Neerlandica,2004,58:127-137.
    [100]Carpenter JM,Goldstein H,Rasbash J.A novel Bootstrap procedure for assessing the relationship between class size and achievement.Appl Statist,2003,52:431-443.
    [101]Nicolas Pasquier,Taouil R,Bastide Y,et al.Generating a Condensed Representation for Association Rules,Journal of Intelligent Information Systems,2005,24(1):29-60.
    [102]邵敏华.NER通路与肺癌易感性的关联性分析研究,中国学位论文全文数据库。
    [1]金明娟,陈冲.DNA修复基因多态性与肺癌易感性的研究进展(J].国外医学遗传学分册,2005,28:215-220.
    [2]ShoPland DR,Eyre HJ,Pechacek TF.Smoking-attl,ibutablecancer Mortalityin 1991:is lung cancer now the leading cause of death among smokers in the United states[J]J Natl Cancer lnst.1991,83:1142-1148.
    [3]朱守民.DNA损伤修复基因与遗传易感性[J].职业与环境医学,2003,20:50-52.
    [4]Wei Q,Cheng L,Hong WK,et al.Reduced DNA repair capacity in lung cancer Patients[J].CancerRes,1996,56:4103-4107.
    [5]Ellen LG,Corn eliaMU,John DP.Polymorphisms in DNA repair Genes and associations with cancer risk[J].Cancer Epidemiol Biomarkers Prev,2002,11:1513-1530.
    [6]GoodeEL,UlrichCM,and PoterJD.Polymorphisms in DNA repair Genes and associations with cancer risk[J].Cancer Epidemiol Biomarkers Prev,2002,11:1513-1530.
    [7]吴钢,何俊.DNA修复酶基因多态性与肺癌易感性研究进展.职业与环境医学[J],2005,22:473-475.
    [8]Friedberg EC.How nucleotide exicision repair Protects against cancer[J].Nat Rev Cancer,2001,1:21-23.
    [9]胡晓东,张吉翔.人XPB基因在核苷酸剪切修复和基因转录中的分子机制.细胞生物学杂志[J]2005,27:291-294.
    [10]马红霞.DNA修复基因多态性与肺癌易感性关系的分子流行病学研究.中国学位论文全文数据库,2006.4
    [11]Fujiwara Y,Masutani C,Mizukoshi T,et al.Characterization of DNA recognition by the human UV-damaged DNA-binding protein[J].J Biol Chem,1999,274:20027- 20033.
    [12]Chen P,Wiencke J,Aldape K,et al.Association of an ERCC1 polymorphism with adult-onset glioma[J].Cancer Epidemiol Biomarkers Prev,2000,9(8):843- 847.
    [13]Sturgis EM,Dahstrom KR,Spitz MR,et al.DNA repair gene ERCC1 and ERCC2/XPD polymorphisms and risk squamous cell carcinoma of the head and neck.Aich Otolaryngol Head Neck Surg,2002,128(90):1084-1088.
    [14]Yin J,Vogel U,Gerdes LU,et al.Twelve single nucleotide polymorphisms on chromosome 19q13.2-3:Linkage disequilibria and associations with basal cell carcinoma in Danish psoriatic patients,Biochemical Genetics,2003,41(1-2):27-37.
    [15]Tomescu D,kavanagh G,Ha T,et al.Nucleotide excision repair gene XPD polymorphisms and genetic predisposition to melanoma.Carcinogenesis,2001,22(3):403-408.
    [16]Winsey SL,Haldar NA,Marsh HP,et al.A variant within the DNA repair gene XRCC3 is associated with the development of melanoma.Cancer Res,2000,60(20):5613-5616.
    [17]Lun RM,HellzlsourerKJ,ParshadR,etal.XPD Polymorphisms:Efects on DNA repair proficiency[J].Carcinogenesis,2000,21:551-555.
    [18]ShenMR,JonesIM,MohrenweiserH,etal.Nonconservative amino acid substitution variants exist at polymorphic frequency in DNA repair genes in healthy humans[J].Cancer Res,1998,58:604—608.
    [19]Benhamuo S,Sarasin A.ERCC2/XPD polymorphisms and cancer risk.Mutagenesis,2002,17(6):463-469.
    [20]XingD,TanW,WeiQ,etal.polymorphisms of the DNA repair gene XPD and risk of lung cancer in a Chinese Population[J]Lung Cancer,2002,38:123-129.
    [21]LiangG,XingD,MiaoX,etal.Sequence variations in the DNA repair gene XPD and risk of lung cancer in a Chinese Population[J].Int J Cancer,2003,105:669-673.
    [22]Dorota B, Marek R, Lindsey E, et al.Genetic polymorphisms in DNA repair genes and risk of lung cancer[J].Carcinogenesis, 2001, 22:593-597.
    [23] Goode EL, Ulrich CM, Potter JD. Polymorphisms in DNA repair gene and associations with cancer risk. Cancer Epidemiol Biomarkers Prey, 2002,11 (12): 1513-1530.
    [24]Zhou W, Liu G, Miller DP, et al. Gene-environment interaction for the ERCC2 polymorphisms and cumulative cigarette smoking exposure in lung cancer.Cancer Res,2002,62(5):1377-1381.
    [25]Hou SM, Fait S, Angelini S, ct al. The DNA variant alleles are associated with increased aromatic DNA adduct level and lung cancer risk.Carcinogenesis, 2002, 23(4):599-560.
    [26]Chen S, Tang D, Xue K, et al. DNA repair gene XRCC1 and XPD polymorphisms and risk of lung cancer in a Chinese population.Carcinogenesis, 2002, 23(8): 1321- 1325.
    [27]Butkiewicz D, Rusin M, Enewold L, et al. Genetic polymorphisms in DNA repair genes and risk of lung cancer. Carcinogenesis, 2001, 22 (40):593-597.
    [28]Butklewicz D, Rusin M, Haris CC , et al.Identification of four single nucleotide polymorphisms of DNA repair genes:XPA and XPD (ERCC3)in Polish Population[J].Hum Mutat, 2000, 15:577-578.
    [29]Park JY, Park SH, Chio JE, et al.Polymorphisms of the DNA repair gene xerodsrma pigmentosum group A and risk of primary lung cancer [J].Cancer Epidemal Biomarkers Prev, 2002, 11:993-997.
    [30]Wu X, Zhao H, Wei Q, et al XPA Polymorphism associated with reduced lung cancer risk and a modulating efect on nucleotide excision repair capacity. Carcinogenesis, 2003,24:505-509.
    [31]Butklewicz D, Popanda O, Risch A, et al.Association between the risk for lung adenocarcinoma and a(-4)G-to-A Polymorphism in the XPA gene[J] .Cancer Epidemiol Biomarkers Prev, 2004, 13:2242-2246.
    [32]Nouspikel T and Clarkson SG.Mutations that disable the DNA Repair gene XPG in a xeroderma pigmentosum group G patient[J].Hum Mol Genet, 1994, 3: 963-967.
    [33]Emmert S, Schneider TD, Khan SG and Kraemer KH.The human XPG gene :gene architecture, alternative splicing and single nucleotide Polymorphisms[J]. Nucleic Acids Res, 2001, 29:1443—1452.
    [34JeonHS, KimKM, ParkSH, LeeSY, et al. Relationship between XPG codon 1104 polymorphism and risk of primary lung cancer[J].Careinogenesis,2003,24:1677-1681.
    [35]CuiY,Morgenstern H,GreenlandS,et al.Polymorphism of Xeroderma Pigmentosum group G and the risk of lung cancer and squamous cell Carcinomas of the oropharynx,larynx and esophagus[J].Int J Cancer,2006,118:714 —720.
    [36]Capovilla A,Arbuthnot P.Hepatitis B Virus X Protein Does Not Influence Essential Steps of Nucleotide Excision Repair Effected by Human Liver Extracts[J].Biochem Biophys Res Commun,2003,312(3):806-810.
    [37]Jia L et al.In J Cancer,1999,80:875??
    [38]Matullo G,Dunning AM,Guarrera S,et al.DNA repair polymorphisms and cancer risk in non2smokers in a cohort study[J].arcinogen,2006,27(5):997-1007.
    [39]Wu X,Gu J,Grossman HB,et al.Bladder cancer predisposition:a multigenic approach to DNA2repair and cell2cycle2control genes[J].Am J Hum enet,2006,78(3):464-479.
    [40]Hu Z,Wei Q,Wang X,Shen H,DNA repair gene XPD polymorphism and lung cancer risk:a meta-analysis.Lung Cancer,2004,46:1 - 10.
    [41]胡志斌,王永岗,马红霞,等.DNA修复基因XPC Ala499Val、Lys939Gln多态与肺癌易感性.中华医学遗传学杂志,2005,22(4):415-418.
    [42]Mohrenweiser HW.Genetic variation and exposure related risk estimation:will toxicology enter a new era?DNA repair and cancer as a paradigm.Toxicol Pathol,2004,32(Suppl):136-145.
    [43]Matullo G,Dunning AM,Guarrera S,et al.DNA repair polymorphisms and cancer risk in non-smokers in a cohort study.Carcinogen,2006,27(5):997-1007.
    [44]Wu X,Gu J,Grossman HB,et al.Bladder cancer predisposition:a multigenic approach to DNA-repair and cell-cycle-control genes.Am J Hum Genet,2006,78(3)-:464-479.
    [45]Gu DF,Su SY,Ge DL,et al.Association Study With 33 Single-Nucleotide Polymorphisms in 11 Candidate Genes for Hypertension in Chinese.Hypertension.2006;47:1147.
    [46]Richie MD,Hahn LW,Rcodi N,et al.Multifactor dimemionality reduction reveals high-order interaction among estrogen-metabolism genes in sporadic breast cancer.Am J Hum Genet,2001,69(1):138-147.
    [47]唐迅,李娜,胡永华.应用多因子降维法分析基因-基因交互作用.中华流行病学杂志,2006,27(5):437-441.
    [48]Agrawal R,Imielienski T,Swami A.Mining Association Rules between Sets of Items in large Databases.Proceeding of the ACM SIGMOD international Conference on Management of Data,1993,(2):207-216.
    [49]彭斌,杨忠,李辉智等.应用关联规则挖掘构建人小脑发育的基因表达关联网络.中国卫生统计,2007,24(2):117-123.
    [50]屠康,喻辉,郭政等.GO功能类与基因差异表达的关联规则挖掘算法.生物化学与生物物理进展,2004,31(8):705-711.
    [51]Becquet C,Blachon S,Jeudy B,et al.Strong-association-rule mining for large-scale gene-expression data analysis:a case study on human SAGE data.Genome Biol 2002,3(12):research0067.
    [52]李昆仑,崔丽娟,张伟,等.基于SVM的蛋白质二级结构预测方法的研究.计算机研究与发展,2007,44(suppl.):319-322.
    [53]王家祥,王利,范应中,等.基于支持向量机的血清蛋白质指纹图谱模型在甲状腺癌诊断中的应用研究.中华医学杂志,2006,86(14):979-982.
    [54]陈敏雅,石蕾.基于SVM多分类决策树的研究综述.电脑知识与技术,2008:1427-1461
    1.Zhang,H,Cai,B.The impact of tobacco on lung health in China.Respirology,2003,8:17-21.2.Zhibin Hu,Minhua Shao,Jing Yuan et al.Polymorphisms in DNA damage binding protein 2(DDB2)and susceptibility of primary lung cancer in the Chinese:a case-control study.Carcinogenesis,2006,27(7):1475 - 1480
    3.Friedberg,E.C.How nucleotide excision repair protects against cancer.Nat.Rev.Cancer,2001,1:22-33.
    4.Moore JH,Ritchie MD.The challenges of whole-genome approaches to common diseases.JAMA,2004,291:1642-1643.
    5.Agrawal R,Mannila H,Srikant R,et al.Fast discovery of association rules.Advances in Knowledge Discovery and Data Mining.AAAI Press,Menlo Park,CA.1996:307-328.
    6.彭斌,杨忠,李辉智等.应用关联规则挖掘构建人小脑发育的基因表达关联网络.中国卫生统计,2007.24(2):117-123.
    7.屠康.喻辉,郭政等.GO功能类与基因差异表达的关联规则挖掘算法.生物化学与生物物理进展,2004,31(8):705-711.
    8.Nicolas P.,Rafik Tl,Yves B.et al.Generating a Condensed Representation for Association Rules.Journal of Intelligent Information Systems,2005,24(1):29-60.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700