基于模糊谱聚类的不确定蛋白质相互作用网络功能模块挖掘
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:Functional module mining in uncertain protein-protein interaction network based on fuzzy spectral clustering
  • 作者:毛伊敏 ; 刘银萍 ; 梁田 ; 毛丁慧
  • 英文作者:MAO Yimin;LIU Yinping;LIANG Tian;MAO Dinghui;School of Information Engineering, Jiangxi University of Science and Technology;College of Applied Science, Jiangxi University of Science and Technology;211 Battalion Company Limited,Sino Shaanxi Nuclear Industry Group;
  • 关键词:不确定数据 ; 蛋白质相互作用 ; 谱聚类算法 ; 模糊C-means ; 功能模块 ; 期望稠密度
  • 英文关键词:uncertain data;;Protein-Protein Interaction(PPI);;spectral clustering algorithm;;Fuzzy C-Means(FCM);;functional module;;expected density
  • 中文刊名:JSJY
  • 英文刊名:Journal of Computer Applications
  • 机构:江西理工大学信息工程学院;江西理工大学应用科学学院;中陕核工业集团二一一大队有限公司;
  • 出版日期:2018-11-12 13:24
  • 出版单位:计算机应用
  • 年:2019
  • 期:v.39;No.344
  • 基金:国家自然科学基金资助项目(41562019);; 江西省教育厅科技项目(GJJ161566)~~
  • 语种:中文;
  • 页:JSJY201904017
  • 页数:9
  • CN:04
  • ISSN:51-1307/TP
  • 分类号:104-112
摘要
针对谱聚类融合模糊C-means(FCM)聚类的蛋白质相互作用(PPI)网络功能模块挖掘方法准确率不高、执行效率较低和易受假阳性影响的问题,提出一种基于模糊谱聚类的不确定PPI网络功能模块挖掘(FSC-FM)方法。首先,构建一个不确定PPI网络模型,使用边聚集系数给每一条蛋白质交互作用赋予一个存在概率测度,克服假阳性对实验结果的影响;第二,利用基于边聚集系数流行距离(FEC)策略改进谱聚类中的相似度计算,解决谱聚类算法对尺度参数敏感的问题,进而利用谱聚类算法对不确定PPI网络数据进行预处理,降低数据的维数,提高聚类的准确率;第三,设计基于密度的概率中心选取策略(DPCS)解决模糊C-means算法对初始聚类中心和聚类数目敏感的问题,并对预处理后的PPI数据进行FCM聚类,提高聚类的执行效率以及灵敏度;最后,采用改进的边期望稠密度(EED)对挖掘出的蛋白质功能模块进行过滤。在酵母菌DIP数据集上运行各个算法可知,FSC-FM与基于不确定图模型的检测蛋白质复合物(DCU)算法相比,F-measure值提高了27.92%,执行效率提高了27.92%;与在动态蛋白质相互作用网络中识别复合物的方法(CDUN)、演化算法(EA)、医学基因或蛋白质预测算法(MGPPA)相比也有更高的F-measure值和执行效率。实验结果表明,在不确定PPI网络中,FSC-FM适合用于功能模块的挖掘。
        Aiming at the problem that Protein-Protein Interaction(PPI) network functional module mining method based on spectral clustering and Fuzzy C-Means(FCM) clustering has low accuracy and low running efficiency, and is susceptible to false positive, a method for Functional Module mining in uncertain PPI network based on Fuzzy Spectral Clustering(FSC-FM) was proposed. Firstly, in order to overcome the effect of false positives, an uncertain PPI network was constructed, in which every protein-protein interaction was endowed with a existence probability measure by using edge aggregation coefficient. Secondly, based on edge aggregation coefficient and flow distance, the similarity calculation of spectral clustering was modified using Flow distance of Edge Clustering coefficient(FEC) strategy to overcome the sensitivity problem of the spectral clustering to the scaling parameters. Then the spectral clustering algorithm was used to preprocess the uncertain PPI network data, reducing the dimension of the data and improving the accuracy of clustering. Thirdly, Density-based Probability Center Selection(DPCS) strategy was designed to solve the problem that FCM algorithm was sensitive to the initial cluster center and clustering numbers, and the processed PPI data was clustered by using FCM algorithm to improve the running efficiency and sensitivity of the clustering. Finally, the mined functional module was filtered by Edge-Expected Density(EED) strategy. Experiments on yeast DIP dataset show that, compared with Detecting protein Complexes based on Uncertain graph model(DCU) algorithm, FSC-FM has F-measure increased by 27.92%, running efficiency increased by 27.92%; compared with an uncertain model-based approach for identifying Dynamic protein Complexes in Uncertain protein-protein interaction Networks(CDUN), Evolutionary Algorithm(EA) and Medical Gene or Protein Prediction Algorithm(MGPPA), FSC-FM also has higher F-measure and running efficiency. The experimental results show that FSC-FM is suitable for the functional module mining in the uncertain PPI network.
引文
[1]冀俊忠,高光轩.基于文化算法的PPI网络功能模块检测方法[J].北京工业大学学报,2017,43(1):13-21.(JI J Z,GAO GX.Detecting functional module method based on cultural algorithm in protein-protein interaction networks[J].Journal of Beijing University of Technology,2017,43(1):13-21.)
    [2]鱼亮,高琳,孙鹏.蛋白质网络中复合体和功能模块预测算法研究[J].计算机学报,2011,34(7):1239-1251.(YU L,GAOL,SUN P.Research on algorithms for complexes and functional modules prediction in protein-protein interaction networks[J].Chinese Journal of Computer,2011,34(7):1239-1251.)
    [3]倪问尹,王建新,熊慧军,等.基于不确定数据的功能模块预测[J].四川大学学报(工程科学版),2013,45(5):80-87.(NI W Y,WANG J X,XIONG H J,et al.Research of detecting functional modules based on uncertainty data[J].Journal of Sichuan University(Engineering Science Edition),2013,45(5):80-87.)
    [4]冀俊忠,刘志军,刘红欣,等.蛋白质相互作用网络功能模块检测的研究综述[J].自动化学报,2014,40(4):577-593.(JI JZ,LIU Z J,LIU H X,et al.An overview research on functional module detection for protein-protein interaction networks[J].Acta Automatica Sinica,2014,40(4):577-593.)
    [5]李敏,王建新,刘彬彬,等.基于极大团扩展的蛋白质复合物识别算法[J].中南大学学报(自然科学版),2010,41(2):560-565.(LI M,WANG J X,LIU B B,et al.An algorithm for identifying protein complexes based on maximal clique extension[J].Journal of Central South University(Science and Technology),2010,41(2):560-565.)
    [6]KESSLER J,ANDRUSHCHENKO V,KAPITAN J,et al.Insight into vibrational circular dichroism of proteins by density functional modeling[J].Physical Chemistry Chemical Physics,2018,20(7):4926-4935.
    [7]ALDECO R,MARIN I.Jerarca:efficient analysis of complex networks using hierarchical clustering[J].PLo S ONE,2010,5(7):11585-11591.
    [8]ABEYSIRIGUNAWARDENA S C,KIM H,LAI J,et al.Evolution of protein-coupled RNA dynamics during hierarchical assembly of ribosomal complexes[J].Nature Communications,2017,8(1):492-500.
    [9]雷秀娟,高银,郭玲.基于拓扑势加权的动态PPI网络复合物挖掘方法[J].电子学报,2018,46(1):145-151.(LEI X J,GAOY,GUO L.Mining protein complexes based on topology potential weight in dynamic protein-protein interaction networks[J].Acta Electronica Sinica,2018,46(1):145-151.)
    [10]YAO X H,YAN J W,LIU K F,et al.Tissue-specific networkbased genome wide study of amygdala imaging phenotypes to identify functional interaction modules[J].Bioinformatics,2017,33(20):3250-3257.
    [11]范子静,罗泽,马永征.一种基于模糊核聚类的谱聚类算法[J].计算机工程,2017,43(11):161-165.(FAN Z J,LUOZ,MA Y Z.A spectral clustering algorithm based on fuzzy kernel clustering[J].Computer Engineering,2017,43(11):161-165.)
    [12]MADANI S,FAEZ K,AMINGHAFARI M.Identifying similar functional modules by a new hybrid spectral clustering method[J].IET Systems Biology,2012,6(5):175-186.
    [13]QIN G M,GAO L.Spectral clustering for protein complexes in Protein-Protein Interaction(PPI)networks[J].Mathematical and Computer Modelling,2010,52(11/12):2066-2074.
    [14]INOUE K,LI W J,KURATA H.Diffusion model based spectral clustering for protein-protein interaction networks[J].PLo S ONE,2010,5(9):12623-12632.
    [15]那第尔.识别蛋白质相互作用网络中的复合物[D].长沙:中南大学,2012:22-34.(NA D E.Exploiting fuzzy spectral clustering in protein-complex detection[D].Changsha:Central South University,2012:22-34.)
    [16]TRIVODALIEV K,CINGOVSKA I,KALAJDZISKI S.Protein function prediction by spectral clustering of protein interaction network[C]//Proceedings of the 2011 Database Theory and Application,Bio-Science and Bio-Technology.Berlin:Springer,2011:108-117.
    [17]ZOU Z N,LI J Z,GAO H,et al.Mining frequent subgraph patterns from uncertain graph data[J].IEEE Transactions on Knowledge and Data Engineering,2010,22(9):1203-1218.
    [18]ZHANG Y J,LIN H F,YANG Z H,et al.An uncertain modelbased approach for identifying protein complexes in uncertain protein-protein interaction networks[J].BMC Genomics,2017,18(7):743-752.
    [19]ZHAO B H,WANG J X,LI M.Detecting protein complexes based on uncertain graph model[J].IEEE/ACM Transactions on Computational Biology&Bioinformatics,2014,11(3):486-497.
    [20]HALIM Z,WAQAS M,HUSSAIN S F.Clustering large probabilistic graphs using multi-population evolutionary algorithm[J].Information Sciences,2015,317(1):78-95.
    [21]BANO R,RAO K.Graph based gene/protein prediction and clustering over uncertain medical databases[J].Journal of Theoretical and Applied Information Technology,2015,82(3):347-352.
    [22]GAO Y J,MIAO X Y,CHEN G,et al.On efficiently finding reverse k-nearest neighbors over uncertain graphs[J].VLDB Journal,2017,26(4):1-26.
    [23]李敏,张含会,费耀平.融合PPI和基因表达数据的关键蛋白质识别方法[J].中南大学学报(自然科学版),2013,44(3):1024-1039.(LI M,ZHANG H H,FEI Y P.Essential protein discovery method based on integration of PPI and gene expression data[J].Journal of Central South University(Science and Technology),2013,44(3):1024-1039.)
    [24]黄链,邓磊.拟-偏b-度量空间中α-φ-压缩映象不动点的存在性[J].西南大学学报(自然科学版),2018,40(3):115-120.(HUANG L,DENG L.α-φ-contractive mappings on quasi-partial b-metric spaces[J].Journal of Southwest University(Natural Science Edition),2018,40(3):115-120.)
    [25]朱镕,邹兆年,李建中.不确定图上的Top-k稠密子图挖掘算法[J].计算机学报,2016,39(8):1570-1582.(ZHU R,ZOUZ N,LI J Z.Mining Top-k dense subgraphs from uncertain graphs[J].Chinese Journal of Computers,2016,39(8):1570-1582.)
    [26]胡赛,熊慧军,陈治平,等.基于不确定网络的关键蛋白质识别[J].四川大学学报(工程科学版),2014,46(5):116-120.(HU S,XIONG H J,CHEN Z P,et al.Identification of essential proteins based on uncertain networks[J].Journal of Sichuan University(Engineering Science Edition),2014,46(5):116-120.)
    [27]王玲,薄列峰,焦李成.密度敏感的谱聚类[J].电子学报,2007,35(8):1577-1581.(WANG L,BO L F,JIAO L C.Density-sensitive spectral clustering[J].Acta Electronica Sinica,2007,35(8):1577-1581.)
    [28]RAFAILIDIS D,CONSTANTINOU E,MANOLOPOULOS Y.Landmark selection for spectral clustering based on weighted PageRank[J].Future Generation Computer Systems,2017,68(3):465-472.
    [29]KESEMEN O,TEZEL O,OZKUL E.Fuzzy C-means clustering algorithm for directional data(FCM4DD)[J].Expert Systems with Applications,2016,58:76-82.
    [30]XENARIOS I,SALWINSKI L,DUAN X J,et al.DIP,the database of interacting proteins:a research tool for studying cellular networks of protein interactions[J].Nucleic Acids Research,2002,30(1):303-305.
    [31]PU S,WONG J,TURNER B,et al.Up-to-date catalogues of yeast protein complexes[J].Nucleic Acids Research,2009,37(3):825-831.
    [32]KROGAN N,CAGNEY G,YU H,et al.Global landscape of protein complexes in the yeast Saccharomyces cerevisiae[J].Nature,2006,440(7084):637-643.
    [33]胡赛,熊慧军,李学勇,等.多关系蛋白质网络构建及其应用研究[J].自动化学报,2015,41(12):2155-2163.(HU S,XIONG H J,LI X Y,et al.Construction of multi-relation protein networks and its application[J].Acta Automatica Sinica,2015,41(12):2155-2163.)
    [34]LEI X J,WU S,LIANG G,et al.Clustering and overlapping modules detection in PPI network based on IBFO[J].Proteomics,2013,13(2):278-290.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700