基于几种机器学习算法的致病遗传基因位点分析
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Analysis of pathogenic genetic loci based on several machine learning algorithms
  • 作者:方雅兰 ; 库在强
  • 英文作者:FANG Ya-lan;KU Zai-qiang;College of Mathematics and Statistics, Huanggang Normal University;
  • 关键词:SNP位点 ; 随机森林算法 ; Bagging算法 ; AdaBoost算法
  • 英文关键词:SNP locus;;Random Forest algorithm;;Bagging algorithm;;AdaBoost algorithm
  • 中文刊名:黄冈师范学院学报
  • 英文刊名:Journal of Huanggang Normal University
  • 机构:黄冈师范学院数学与统计学院;
  • 出版日期:2019-06-10
  • 出版单位:黄冈师范学院学报
  • 年:2019
  • 期:03
  • 基金:2018年黄冈师范学院教育硕士教学案例项目(JYJXAL2018001)
  • 语种:中文;
  • 页:7-11
  • 页数:5
  • CN:42-1275/G4
  • ISSN:1003-8078
  • 分类号:TP181;R394
摘要
基因中的SNP位点的识别与筛选已成为复杂疾病与基因关联分析研究中日益重要的课题.本文首先对某类疾病基因库采用医学上常用的位点分类方式,分别统计样本总体各个位点的基因频率,从而确定主等位基因与次等位基因,将每个位点的碱基对(A,T,C,G)信息编码转化为数值编码.其次,采用卡方检验方法粗略筛选出可能的SNP位点,最后应用随机森林算法、Bagging、AdaBoost算法、Lasso Logistic算法等机器学习算法筛选出判别结果具有一致性的基因位点,并采用Cross-Validation方法对筛选结果的有效性进行了验证.
        The identification and screening of SNP locus in genes has become an increasingly important topic in the study of complex diseases and gene associations. Firstly, This paper adopts the commonly used site classification methods for certain disease gene banks to count the individual sites' gene frequency which is of the sample separately. This operation can help us determine the primary allele and the minor allele and encode the base pair(A, T, C, G) information of each locus into a numerical code. Secondly, using the chi-square test method to roughly screen the possible SNP loci were used. Finally, the machine learning algorithm such as Random Forest algorithm, Bagging, AdaBoost algorithm and Lasso Logistic algorithm was used to screen the loci with consistent results. The Cross-Validation method was used to check the validity of the screening results.
引文
[1] 黎成.基于随机森林和ReliefF的致病SNP识别方法[D].陕西:西安电子科技大学,2014:28-32.
    [2] 中国研究生数学建模竞赛网[EB/OL].(2016-05-01)[2018-01-06].http://www.shumo.com/home/html/3336.html.
    [3] Breiman L,Cutler R A.Random Forests Machine Learning[J].Jouranl of Clinical Microbiology,2001,45(1):5-32.
    [4] 周志华.机器学习[M].北京.清华大学出版社,2016:42-50.
    [5] Wang L W,Deng X C,Jing Z X,et al.Further results on the margin explanation of boosting:new algorithm and experiments[J].Science China Information Sciences,2012,55(7):1551-1562.
    [6] 李航.统计学习方法[M].北京:清华大学出版社,2012:20-24.
    [7] Efron B,Hastie T,Johnstone I,et al.Least angle regression[J].The Annals of Statistics,2004,32(2):407-451.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700