Data Mining for Discovery of Clinical and Genomic Disease Markers.
详细信息   
  • 作者:Gupta ; Rohit.
  • 学历:Doctor
  • 年:2010
  • 导师:Kumar, Vipin,eadvisorGroen, Piet deecommittee memberKuang, Ruiecommittee memberMyers, Chadecommittee memberLim, Kelvinecommittee member
  • 毕业院校:University of Minnesota
  • Department:Computer Science
  • ISBN:9781124953175
  • CBH:3478103
  • Country:USA
  • 语种:English
  • FileSize:4275601
  • Pages:152
文摘
Recent technical advancements have led to the availability of individual level clinical, genomic and genetic information. This has created the opportunity and need to develop new data mining techniques to analyze these enormous amounts of data and thereby shed light on not only the inner-biology of the complex diseases but also ways of effective treatment. These data mining techniques have tremendous potential to uncover important associations between clinical and genomic factors and disease phenotypes diagnosis, prognosis, therapeutic responses etc), thereby playing crucial role in making personalized medicine a reality. This dissertation first presents a clinical case study and proposes methods to discover clinical markers for the quality of colonoscopy. A unique database of interval and detected colorectal cancers is created using electronic medical records from Mayo Clinic Rochester. Standard statistical tests are used to assess and discuss the role of several physician- and patient-related factors in the development of colorectal cancer despite colonoscopy. A future case-control study to determine any genetic differences between two groups of patients interval cancers and detected cancers) is also proposed and a need for techniques that can find multiple markers groups of genetic factors and interactions among them) is highlighted. Complex diseases are polygenic in nature i.e. they are caused and affected by multiple genes and interactions among them. Despite this understanding, most of the existing approaches that use gene expression data for biomarker discovery employ differential analysis to identify only individual genes. These single markers, although do provide some information, are less reproducible across studies and seldom explain the underlying mechanism of the complex polygenic diseases. To address these issues, in this dissertation, we propose data mining based error-tolerant pattern mining techniques for discovering coherently expressed groups of genes from microarray gene expression data sets. These approaches can explicitly handle noise and errors in the data and can systematically discover error-tolerant patterns from both binary and real-valued data sets. Although, the proposed approaches can be quite effective and useful in various application domains, in this dissertation they are primarily applied to two biological problems: discovery of functional modules groups of functionally related genes) and discovery of disease markers. Further, as each biological data source is noisy and provide a different but complimentary view, complex problems like biomarker discovery require more information than provided by any individual biological data. With this motivation, some recent approaches focus on discovering biomarkers by combining gene-expression measurements with protein-protein interaction PPI) data, however they are greedy in nature and do not explicitly handle noise in the data. Therefore, we extended error-tolerant pattern mining framework to discover patterns using more than one data sets. This integrated pattern mining approach turns out to be highly effective for biomarker discovery problem since it can systematically and efficiently find multiple-gene biomarkers that not only co-express but also induce a sufficiently connected sub-graph in the PPI network. The afficacy of the proposed approach is illustrated by using various Breast Cancer gene-expression data sets and a human protein-protein interaction network to discover sub-network based biomarkers. It was shown that these subnetwork biomarkers are more biologically plausible, more reproducible, and finally more likely to be true than random.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700