Optimizing decision tree ensembles for gene-gene interaction detection.
详细信息   
  • 作者:Assareh ; Amin.
  • 学历:Doctor
  • 年:2012
  • 导师:Volkert,L. Gwenn,eadvisorVolkert,L. Gwennecommittee memberDragan,Feodor F.ecommittee memberJin,Roumingecommittee memberLi,Jingecommittee memberOrtiz,Joseph D.ecommittee member
  • 毕业院校:Kent State University
  • Department:Computer Science.
  • ISBN:9781267818157
  • CBH:3534606
  • Country:USA
  • 语种:English
  • FileSize:2254768
  • Pages:149
文摘
In recent years,genome-wide association studies GWAS) have been dedicated to unraveling the genetic etiology of complex diseases. It is widely accepted that most common diseases such as neurodegenerative diseases e.g.,Alzheimers and Parkinsons diseases),cardiovascular diseases,various cancers,diabetes and osteoporosis are the results of multiple genes,their interactions,environmental factors,and gene-by-environment interactions and thus cannot be explained by a simple Mendelian inheritance model. Consequently,the study of dissecting gene-gene and/or gene-environment interactions involved in complex diseases/traits has become an active research topic in computational genomics. However,high dimensionalities of genotype data and exponential complexity of the search space with respect to the order of targeted interactions,make most existing interaction detection strategies practically inapplicable. Because they are capable of capturing interactions among input variables in addition to the nonlinear effects,decision trees and their ensembles have been recently demonstrated to be effective strategies in detecting interactions in GWAS data. However,an individual decision tree DT) is highly susceptible to some major limitations,most importantly high variance error,data fragmentation and representational problems,which make them unreliable for use in feature selection in a stand-alone fashion. Ensemble approaches have been proposed to increase the robustness of weak learners such as DTs,by using multiple different and potentially complementary representations of the data. Some of the limitations of individual decision trees would still exist in the ensemble level which may impact their interaction detection performance. The objectives of this dissertation are to: • Study the systematic limitations of individual decision trees which may impact their interaction detection performance and the possible solutions; • Investigate the application of decision tree ensembles in interaction detections,with respect to the functional characteristics of the applied ensemble strategy; • Compare four well-known ensemble frameworks,namely AdaBoost,LogitBoost,Bagging and Random Forest,and their pros and cons as far as interaction detection is concerned; • Provide a unified framework to optimize the application of DT ensembles in interaction detection.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700