利用滑动窗口和KNN算法识别差异甲基化区域
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Algorithm of Identifying Differentially Methylated Region Based on Sliding Windows and KNN
  • 作者:李华兵 ; 杨昆
  • 英文作者:LI Huabing;YANG Kun;School of Computer,Hangzhou Dianzi University;
  • 关键词:差异甲基化区域 ; 滑动窗口 ; KNN分类器 ; 多类问题 ; 聚类指数
  • 英文关键词:differentially methylated regions;;slide window;;k-nearest neighbor classifier;;multi-class problem;;cluster index
  • 中文刊名:HXDY
  • 英文刊名:Journal of Hangzhou Dianzi University(Natural Sciences)
  • 机构:杭州电子科技大学计算学院;
  • 出版日期:2016-07-15
  • 出版单位:杭州电子科技大学学报(自然科学版)
  • 年:2016
  • 期:v.36;No.162
  • 基金:国家自然科学基金资助项目(60903086)
  • 语种:中文;
  • 页:HXDY201604008
  • 页数:5
  • CN:04
  • ISSN:33-1339/TN
  • 分类号:38-42
摘要
针对现有差异甲基化区域DMRs识别方法中过度删除显著性弱的甲基化位点、DMRs长度受限以及不能直接处理多类的问题,提出了一种利用滑动窗口和KNN算法识别不同类别间DMRs的算法.算法先通过滑动窗口结合KNN分类器筛选候选区域,再根据误差率合并候选区域得到DMRs.真实数据上的实验表明,算法的分类性能、聚类指数明显优于对照算法,扩展了对照的Ong算法识别的DMRs长度,并能发现Ong算法未发现的DMRs.
        In view of the shortcomings of the existing methods for identifying differentially methylated regions(DMRs),such as over deletion of sites that significance are weaker,region length limitation and can't be directly processed by the multi-class.An algorithm of identifying DMRs based on sliding window and k-nearest neighbor(KNN)is proposed.In this method,candidate regions are obtained using sliding windows and KNN,and it merges candidate regions to get DMRs.Through real data simulation results demonstrate the method is superior to control method,such as classification performance,cluster index,the DMRs length of the control methods of Ong is extended and find some DMRs that can't be found in control algorithm of Ong.
引文
[1]杨昆,张彦斌,戴胜冬,等.DNA甲基化的重要特征[J].生物物理学报,2012,28(11):910-922.
    [2]JAFFF A E,MURAKAMI P,LEE H,et al.Bump hunting to identify differentially methylated regions in epigenetic epidemiology studies[J].International journal of epidemiology,2012,41(1):200-209.
    [3]SLIEKER R C,BOS S D,GOEMAN J J,et al.Identification and systematic annotation of tissue-specific differentially methylated regions using the Illumina 450karray[J].Epigenetics&Chromatin,2013,6(1):1-12.
    [4]ONG M L,HOLBROOK J D.Novel region discovery method for Infinium 450K DNA methylation data reveals changes associated with aging in muscle and neuronal pathways[J].Aging Cell,2014,13(1):142-155.
    [5]ZHANG Y,ZHANG J.Identification of functionally methylated regions based on discriminant analysis through integrating methylation and gene expression data[J].Molecular BioSystems,2015,11(7):1786-1793.
    [6]ALISCH R S,BARWICK B G,CHOPRA P,et al.Age-associated DNA methylation in pediatric populations[J].Genome research,2012,22(4):623-632.
    [7]HEYN H,LI N,FERREIRA H J,et al.Distinct DNA methylomes of newborns and centenarians[J].Proceedings of the National Academy of Sciences,2012,109(26):10522-10527.
    [8]TROYANSKAYA O,CANTOR M,SHERLOCK G,et al.Missing value estimation methods for DNA microarrays[J].Bioinformatics,2001,17(6):520-525.
    [9]BOLSHAKOVA N,AZUAJE F.Cluster validation techniques for genome expression data[J].Signal processing,2003,83(4):825-833.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700