Boosting alternating decision trees modeling of disease trait information
详细信息    查看全文
  • 作者:Kuang-Yu Liu (1) (2)
    Jennifer Lin (3)
    Xiaobo Zhou (1) (2)
    Stephen TC Wong (1) (2)
  • 刊名:BMC Genetics
  • 出版年:2005
  • 出版时间:December 2005
  • 年:2005
  • 卷:6
  • 期:1-supp
  • 全文大小:483KB
  • 参考文献:1. Freund Y, Mason L: The alternating decision tree learning algorithm. / The Proceedings of the Sixteenth International Conference on Machine Learning8 San Francisco: Morgan Kaufmann Publishers, Inc 1999, 124-33.
    2. Breiman L: Bias, variance, and arcing classifiers. Technical report 460, Statistics Department, University of California at Berkeley 1996.
    3. Quinlan J: Bagging, boosting, and C4.5. / The Proceedings of the Thirteenth National Conference on Artificial Intelligence. Menlo Park, CA: AAAI Press 1996, 725-30.
    4. Freund Y, Schapire RE: A decision–theoretic generalization of on–line learning and an application to boosting. / J Comput System Sci 1997, 55: 119-39. CrossRef
    5. Schapire R, Singer Y: Improved boosting algorithms using confidence–rated predictions. / The Proceedings of the Eleventh Annual Conference on Computational Learning Theory New York: ACM Press 1998, 80-1. CrossRef
    6. Freund Y, Schapire R, Singer Y, Orlitsky A, Duffy N:MLJava. 2004.
    7. Middendorf M, Kundaje A, Wiggins C, Freund Y, Leslie C: Predicting genetic regulatory response using classification. / Bioinformatics 2004,20(Suppl 1): I232–I240. CrossRef
    8. Kruglyak L, Daly MJ, Reeve–Daly MP, Lander ES: Parametric and nonparametric linkage analysis: a unified multipoint approach. / Am J Hum Genet 1996, 58: 1347-363.
  • 作者单位:Kuang-Yu Liu (1) (2)
    Jennifer Lin (3)
    Xiaobo Zhou (1) (2)
    Stephen TC Wong (1) (2)

    1. HCNR Center for Bioinformatics, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02215, USA
    2. Department of Radiology, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02215, USA
    3. Division of Preventive Medicine, Brigham and Women's Hospital, Harvard Medical School, Boston, MA, 02215, USA
文摘
We applied the alternating decision trees (ADTrees) method to the last 3 replicates from the Aipotu, Danacca, Karangar, and NYC populations in the Problem 2 simulated Genetic Analysis Workshop dataset. Using information from the 12 binary phenotypes and sex as input and Kofendrerd Personality Disorder disease status as the outcome of ADTrees-based classifiers, we obtained a new quantitative trait based on average prediction scores, which was then used for genome-wide quantitative trait linkage (QTL) analysis. ADTrees are machine learning methods that combine boosting and decision trees algorithms to generate smaller and easier-to-interpret classification rules. In this application, we compared four modeling strategies from the combinations of two boosting iterations (log or exponential loss functions) coupled with two choices of tree generation types (a full alternating decision tree or a classic boosting decision tree). These four different strategies were applied to the founders in each population to construct four classifiers, which were then applied to each study participant. To compute average prediction score for each subject with a specific trait profile, such a process was repeated with 10 runs of 10-fold cross validation, and standardized prediction scores obtained from the 10 runs were averaged and used in subsequent expectation-maximization Haseman-Elston QTL analyses (implemented in GENEHUNTER) with the approximate 900 SNPs in Hardy-Weinberg equilibrium provided for each population. Our QTL analyses on the basis of four models (a full alternating decision tree and a classic boosting decision tree paired with either log or exponential loss function) detected evidence for linkage (Z ?1.96, p < 0.01) on chromosomes 1, 3, 5, and 9. Moreover, using average iteration and abundance scores for the 12 phenotypes and sex as their relevancy measurements, we found all relevant phenotypes for all four populations except phenotype b for the Karangar population, with suggested subgroup structure consistent with latent traits used in the model. In conclusion, our findings suggest that the ADTrees method may offer a more accurate representation of the disease status that allows for better detection of linkage evidence.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700