面向生物数据的关联规则挖掘算法及其应用研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着基因组和蛋白质组研究的进展,以及现代生物技术的快速发展,由高通量技术产生了海量生物数据,这为揭开生命奥秘提供了数据基础。生物数据种类丰富,高通量,维数高,具有异构易变的特性,远远超出传统的分析方法的能力,生物数据的分析成为当今生物学研究的瓶颈,对其处理、挖掘、分析和理解的要求日益迫切。
     目前生物数据分析中存在着一些问题,例如,数据分析采用的算法模型有越来越复杂的趋势,被用于数据分析的黑盒算法获得的分析结果难以作出生物解释等。而生物信息学研究的根本目的就是利用生物数据,解释生命现象,发掘生命规律。
     关联规则是一种重要的数据挖掘技术,利用该技术从生物数据中挖掘获得的模式即具有生物学上的意义(重要性),又具有数学上的重要性(可发现性),且结构透明,具有良好的可解释性。本文主要对面向生物数据的关联规则挖掘算法及其应用进行了研究,其主要研究内容包括:
     (1)多相关关联规则挖掘算法及其应用研究
     生物数据中蕴含着丰富的内涵,仅利用传统的关联规则挖掘,一些有意义的模式会被丢失而无法获得,为此,本文提出了一种新形式的关联规则一多相关关联规则,在给出多相关关联规则形式化定义的基础上,对有用多相关关联规则的挖掘准则进行了研究,并给出了一个挖掘算法,并且利用多相关关联规则对蛋白质结构数据进行了分析,从中得到了很多有用的规则,在其它两个数据集上也进行了实验,得到了一些新颖的知识。
     (2)利用定量关联规则分析蛋白质结构数据的研究
     1961年Anfinsen提出蛋白质分子的一级序列完全决定其空间结构的论断,对于这个假定,我们需要分析如下几个问题:不同的氨基酸对不同的蛋白质空间结构形成是否具有不同的倾向性?蛋白质的氨基酸序列是否是随机的?序列中是否存在着一些氨基酸共生模式?这些模式是否对不同空间结构的形成具有不同的倾向性?目前开展的大部分研究是基于氨基酸序列预测蛋白质各位点的空间结构,主要是定性研究,利用定量方法分析不同氨基酸对形成不同蛋白质结构的倾向性的研究却较少,本文提出利用定量关联规则分析蛋白质的氨基酸构成和蛋白质结构形成间的关联关系,获得了很多有用的规则,这些规则对人工合成蛋白质分子具有参考价值。
     (3)聚类和关联规则挖掘在基因表达数据分析中的应用研究
     由于基因表达数据具有高维低样本的特点,直接对基因表达数据进行关联规则挖掘,实际上是不可行的。为此,本文将聚类和关联规则挖掘相结合,首先对基因表达数据进行聚类分析,得到若干基因簇,实现了分析数据的降维,然后对每个基因簇中的表达数据进行离散化,将每个基因离散化为7个项目,然后进行关联规则挖掘,得到了大量的关联规则,得到的这些关联规则不仅提供了基因之间的调控方向,而且还提供了基因之间调控强度的信息。
     (4)从肿瘤基因表达数据挖掘分类规则的研究
     基于关联规则的分类研究是关联规则挖掘研究的一个热点,目前这方面也已经开展了大量的研究工作。由于肿瘤基因表达数据中的样本具有高维低样本的特点,所以很难直接应用传统的关联规则挖掘算法构建分类器,因此本文提出了一种直接从肿瘤基因表达数据挖掘分类规则的方法,这种方法首先从数据中抽取分类特征,然后基于分类特征产生分类规则,基于这些分类规则按照置信度最高的原则进行样本类别预测,实验表明,该方法不仅具有良好的预测精度,并且相对于黑盒算法来说,具有良好的可解释性。
With the quick development of the research of Genomics and Proteomics, at the same time, more advanced biology technology invented, huge amount of biological dataum are accumulated, which provide the data basis for uncovering the nature of life. The biological dataum have many its own features, which consists of plenty of categories, high-throughput and high dimension. All these features make it very diffcult to analyze these biological dataum because it far beyonds the capalicity of the traditional statistical analysizing methods. Analyzing biological dataum has become the bottleneck of biological research. The requirements of processing, mining, analyzing and understanding biological dataum become increasingly urgent.
     Some problems are with the research of analyzing biological dataum currently. For example, A trend appears that more and more complicated algorithms and models are adopted when analyzing biological dataum.Also, it is hard to interpret the analyzing results biologically from the black box algorithms. While the aim of bioinformatics research is to interpret biological phenomena and dig out the nature of life based on the biological dataum, accordingly, more appropriate analyzing algorithms are needed to analyze biological dataum.
     Association rule is an important data mining technology. Using such technology, some patterns can be finded form biological data that is significant biologically and mathematically. In this dissertation, the theoretics and application of the algorithm of association rule for analyzing biological dataum are studied. The main content in this dissertation are described below.
     (1)The study of the algorithm for mining multi-association rules and its applicationBiological data contains abundant connotation, lots of which can't be mined using traditional associaiton rule algorithm. In order to mine more knowledge form biological data, a new form of association rule, multi-association rule, is presented in this dissertation. This dissertation presents the formal definition of the multi-association rule, the mining guid lines for useful multi-association rule and an algotrithm for mining multi-association rule. Applying this algorithm to mine three datset and many useful rules obtained.
     (2)The study of analyzing protein sturcture data using quantitative association rule
     In 1961, Anfinsen presented such assumption that the amino acid sequences of protein molecule totally determine its spacial structure. To validate such assumption, we can divide it to the following problems: Are the amino acid sequences of protein random? Does different type of amino acid have different orientation for developing different protein spacial structure? Do the occurring patterns exist in the amino acid sequences? Do these patterns have different orientation for developing protein spacial structure? Most current research mainly focuse how to predict protein spacial structure in each site based on the amino acid sequences, which is qualitative analysis. Few research is about the orientation of every teype of amino acid for developing different protein spacial structure using quantitative analysis methods. This dissertation analyzes the association relationship of the amino acid ingredient in protein and the protein spacial structure using quantitative association rule. Many interesting association rules obtained through experiment. Such rules obtained here can hold the potential to give clues regarding the global interactions amongst some particular sets of amino acids occurring in protein and the guiding information containing in the amino acids sequences for the development of the structure of the protein. These rules will prove very important in the design and synthesis of artificial peptides outside the cell.
     (3)The study of application of clustering and association rule mining to analyzing gene expression data
     Because of the high dimension and small sample set of gene expression data, it is impossible practically to mine gene expression data using association rule mining algorithm directly. According, this dissertation incoporate the clustering and association rule mining to analyze the gene expression data. Firstly using clustering menthod to get some gene clusters, and then discritize each gene to seven items, at last, we can get many rules from every gene cluters using association rule mining algorithm. These rules can give not only the information about gene regulation direction but also that about gene regulation strength.
     (4)The study of mining classifying rule from tumoral gene expression data
     Classification based on association rule is a useful predictive technology. Because the gene expression data has high dimension but small sample set, it is hard to construct classifier using traditional association rule mining method based on such data. Hence, this dissertation provide a new method that directly mine classifying rules from gene expression data and construct clsssifier using these classifying rules. The experiment results show that this method has a high predictive accurency and is easy to interpret biologically.
引文
1.Tan P-n,Steinbach M,Kumar V.数据挖掘导论.北京:人民邮电出版社:2006
    2.郑朝霞,刘廷建.关联规则在股票分析中的应用.成都大学学报 2002;21:46-49
    3.杨洪涛,李梓君.关联规则在房地产广告媒体选择中的应用.计算机工程与应用2006;42:230-232
    4.马振华.现代应用数学手册—概论统计与随机过程卷.北京:清华大学出版社:2000.7
    5.孙啸,陆祖宏,谢建明.生物信息学基础.北京:清华大学出版社:2005.5
    6.罗静初.生物信息学概论.北京:北京大学出版社:2002.4
    7.李巍.生物信息学导论.郑州:郑州大学出版社:2004.10
    8.殷志祥,张家秀.神经网络在蛋白质结构预测中的应用.中国科技信息 2005;1:11
    9.张立明.人工神经网络的模型及其应用.上海:复旦大学出版社:1993
    10.蒋宗礼.人工神经网络导论.北京:高等教育出版社;2001
    11.胡守仁.人工神经网络导论.北京:国防科技大学出版社;1997
    12.钟扬,张亮,赵琼.简明生物信息学.In.北京:高等教育出版社;2001
    13.崔光照,张勋才,曹祥红,董亚非,王延峰.基于动态贝叶斯网络的多时延基因调控网络构建.科学技术与工程 2005;5:1247-1251
    14.李瑶.基因芯片数据分析与处理.北京:化学工业出版社;2006.7
    15.蒋彦,王小行,曹毅,王喜忠.基础生物信息学及应用.北京:清华大学出版社;2003
    16.陈双平,郑浩然,刘海燕,王煦法.蛋白质序列中关联规则发现及其应用.生物物理学报 2006;20(1):171-176
    17.谭义红,李学勇,陈治平.关联规则挖掘在web信息检索中的应用.计算机工程2006;32:57-59
    18.邱洁,过仲阳,苏君毅,戴晓燕,林晖.关联规则及其在灾害天气预测中的应用.华东师范大学学报;2005:165-169
    19.江明华,何中市.Apriori算法在篮球比赛常用技术动作挖掘中的应用.计算机应用 2007
    20.鲍文,于达仁,王伟,徐志强.基于关联规则的火电厂传感器故障检测.中国电机工程学报 2003;23:170-174
    21.Witten IH,Frank E.数据挖掘实用机器学习技术.1n:北京:机械工业出版社;2006
    22.Tan Pang-ning MS,Vipin Kumar.数据挖掘导论.北京:人民邮电出版社;2006
    23.Han J,Kamber M.数据挖掘概念与技术.In:北京:机械工业出版社;2001
    24.李颖新,阮晓钢.一种肿瘤基因表达数据的知识提取方法.电子学报 2004;.32:1479-1482
    25.Pevsner J.生物信息学与功能基因组学.北京:化学工业出版社;2006
    26.W.Mount D.生物信息学:序列与基因组分析.北京:科学出版社;2006
    27.Baxevanis AD,Ouellette BFF.Bioinformatics:A Practical Guide to the Analysis of Genes and Proteins:Wiley-Interscience;2001
    28.Casari G,Andrade MA,Bork P,et al.Challenging times for bioinformatics.In;1995:647-648
    29.http://www.ncbi.nlm.nih.gov/Genbank/.
    30.http://www.expasy.ch/sprot/.
    31.http://www.rcsb.org/pdb/home/home.do.
    32.Alan Kimmel BO.DNA芯片(B辑):数据和分析.北京:科学出版社;2007.1
    33.Blanchard A.Synthetic DNA arrays.In;1998:111-123
    34.Southern EM.DNA microarrays.History and overview.In:Humana Press;2001:1-15
    35.Baldi P,Hatfield GW.DNA microarrays and gene expression:Cambridge University Press New York,NY;2002
    36.Baldi P,Brunak S.Bioinformatics:The machine learning approach:MIT Press Cambridge,USA;1998
    37.Wang JTL,Ma O,Shasha D,Wu CH.Application of neural networks to biological data mining:a case study in protein sequence classification.In:ACM Press New York,NY,USA;2000:305-309
    38.Chuan-bo C,Tao LI.A hybrid neural network system for prediction and recognition of promoter regions in human genome.2005:401-407
    39.Almeida JS.Predictive non-linear modeling of complex data by artificial neural networks.Current Opinion in Biotechnology 2002;13:72-76
    40.Boland MV,Murphy RF.A neural network classifier capable of recognizing the patterns of all major subcellular structures in fluorescence microscope images of HeLa cells.In:Oxford Univ Press;2001:1213-1223
    41.Browne A,Hudson BD,Whitley DC,Ford MG,Picton P.Biological data mining with neural networks:implementation and application of a flexible decision tree extraction algorithm to genomic problem domains.Neurocomputing 2004;57:275-293
    42.Kasabov N,Pang S.Transductive support vector machines and applications in bioinformatics for promoter recognition.In,Neural Networks and Signal Processing,Proceedings of the 2003International Conference:1-6
    43.Bao L,Sun Z.Identifying genes related to drug anticancer mechanisms using support vector machine.FEBS Letters 2002;521:109-114
    44.Wang M,Yang J,Liu GP,Xu ZJ,Chou KC.Weighted-support vector machines for predicting membrane protein types based on pseudo-amino acid composition.Protein Engineering Design and Selection 2004;17:509-516
    45.Cai CZ,Han LY,Ji ZL,Chen YZ.Enzyme family classification by support vector machines.Proteins Structure Function and Bioinformatics 2004;55:66-76
    46.Si-hua P,Long-jiang FAN,Xiao-ning P,et al.Splicing-site recognition of rice(Oryza sativa L.)DNA sequences by support vector machines.JOURNAL OF ZHEJIANG UNIVERSITY (SCIENCE),2003;4:573-577
    47.Bradford JR,Westhead DR.Improved prediction of protein-protein binding sites using a support vector machines approach.Bioinformatics 2005;21:1487-1494
    48.Zien A,Ratsch G,Mika S,et al.Engineering support vector machine kernels that recognize translation initiation sites.Bioinformatics 2000;16:799-807
    49.Yang ZR.Biological applications of support vector machines.Briefings in Bioinformatics 2004;5:328-338
    50.Brown MPS,Grundy WN,Lin D,et al.Knowledge-based analysis of microarray gene expression data by using support vector machines.In,Proceedings of the National Academy of Sciences:National Acad Sciences;2000:262
    51.Hua S,Sun Z.Support vector machine approach for protein subcellular localization prediction.Bioinformatics 2001;17:721-728
    52.Ding CHQ,Dubchak I.Multi-class protein fold recognition using support vector machines and neural networks.Bioinformatics 2001;17:349-358
    53.Cristianini N,Shawe-Taylor J.An Introduction to Support Vector Machines and Other Kernel-based Learning Methods.In:Cambridge University Press;2000
    54.Guyon I,Weston J,Barnhill S,Vapnik V.Gene Selection for Cancer Classification using Support Vector Machines.Machine Learning 2002;46:389-422
    55.Hua S,Sun Z.A novel method of protein secondary structure prediction with high segment overlap measure:support vector machine approach.Journal of Molecular Biology 2001;308:397-407
    56.Ward JJ,McGuffin LJ,Buxton BF,Jones DT.Secondary structure prediction with support vector machines.Bioinformatics 2003;19:1650-1655
    57.Chou KC,Cai YD.Using Functional Domain Composition and Support Vector Machines for Prediction of Protein Subcellular Location.Journal of Biological Chemistry 2002;277:45765-45769
    58.Zhao Y,Pinilla C,Valmori D,Martin R,Simon R.Application of support vector machines for T-cell epitopes prediction.Bioinformatics 2003;19:1978-1984
    59.Zhang SW,Pan Q,Zhang HC,Zhang YL,Wang HY.Classification of protein quaternary structure with support vector machine.Bioinformatics 2003;19:2390-2396
    60.Zavaljevski N,Stevens FJ,Reifman J.Support vector machines with selective kernel scaling for protein classification and identification of key amino acid positions.Bioinformatics 2002;18:689-696
    61.Yang ZR,Chou KC.Bio-support vector machines for computational proteomics.Bioinformatics 2004;20:735-741
    62.Yuan Z,Burrage K,Mattick JS.Prediction of Protein Solvent Accessibility Using Support Vector Machines.Machine Learning 2002;48:566-570
    63.Cai YD,Liu XJ,Xu X,Chou KC.Prediction of protein structural classes by support vector machines.Computers and Chemistry 2002;26:293-296
    64.Cai CZ,Wang WL,Chen YZ.Support vector machine classification of physical and biological datasets.Inter J Mod Phys C 2003;14:575-585
    65.Garg A,Bhasin M,Raghava GPS.Support Vector Machine-based Method for Subcellular Localization of Human Proteins Using Amino Acid Compositions,Their Order,and Similarity Search.Journal of Biological Chemistry 2005;280:14427
    66.Markowetz F.Support Vector Machines in Bioinformatics:http://genomics.princeton.edu/~florian/docs/diplom.pdf;2002
    67.Valentini G.Gene expression data analysis of human lymphoma using support vector machines and output coding ensembles.Artificial Intelligence In Medicine 2002;26:281-304
    68.Xue C,Li F,He T,et al.Classification of real and pseudo microRNA precursors using local structure-sequence features and support vector machine.BMC Bioinformatics 2005;6:310
    69.Markowetz F,Edler L,Vingron M.Support Vector Machines for Protein Fold Class Prediction.Biometrical Journal 2003;45:377-389
    70.Sun XD,Huang RB.Prediction of protein structural classes using support vector machines.In:Springer;2006:469-475
    71.Vinayagam A,Kinig R,Moormann J,et al.Applying Support Vector Machines for Gene ontology based gene function prediction.2005
    72.Liu Y.Active learning with support vector machine applied to gene expression data for cancer classification.J Chem Inf Comput Sci 2004;44:1936-1941
    73.Nguyen MN,Rajapakse JC.Multi-class support vector machines for protein secondary structure prediction.Genome Informatics 2003;14:218-227
    74.Komura D,Nakamura H,Tsutsumi S,Aburatani H,Ihara S.Multidimensional support vector machines for visualization of gene expression data.Bioinformatics 2005;21:439-444
    75.Zhang HH,Ahn J,Lin X,Park C.Gene selection using support vector machines with non-convex penalty.Bioinformatics 2006;22:88-95
    76.ZhiGuo Y,Long M,Long K,et al.Fast Fourier Transform-based Support Vector Machine for Prediction of G-protein Coupled Receptor Subfamilies.生物化学与生物物理学报2005;37:759-766
    77.Valentini G,Muselli M,Ruffino F.Cancer recognition with bagged ensembles of support vector machines.Neurocomputing 2004;56:461-466
    78.Asai K,Hayamizu S,Handa K.Prediction of protein secondary structure by the hidden Markov model.Bioinformatics 2002;9:141-146
    79.Koski T.Hidden Markov Models for Bioinformatics:Kluwer Academic Publishers;2001
    80.Choo KH,Tong JC,Zhang L.Recent Applications of Hidden Markov Models in Computational Biology.GENOMICS 2004;2:84-96
    81.Schliep A,Schonhuth A,Steinhoff C.Using hidden Markov models to analyze gene expression time course data.Bioinformatics 2003;19:255-263
    82.Pedersen JS,Hein J.Gene finding with a hidden Markov model of genome structure and evolution.Bioinformatics 2003;19:219-227
    83.Grundy WN,Bailey TL,Elkan CP,Baker ME.meta-MEME:Motif-based hidden Markov models of protein families.Bioinformatics 2004;13:397-406
    84.Ohler U.Interpolated markov chains for eukaryotic promoter recognition.Bioinformatics 1999;15:362-369
    85.Husmeier D,McGuire G.Detecting Recombination in 4-Taxa DNA Sequence Alignments with Bayesian Hidden Markov Models and Markov Chain Monte Carlo.Bioinformatics 2003;20:315-337
    86.Krogh A,Larsson B,von Heijne G,Sonnhammer ELL.Predicting transmembrane protein topology with a hidden markov model:application to complete genomes.Journal of Molecular Biology 2001;305:567-580
    87.Stanke M,Waack S.Gene prediction with a hidden Markov model and a new intron submodel.Bioinformatics 2003;19:215-225
    88.Bystroff C,Thorsson V,Baker D.HMMSTR:a hidden Markov model for local sequence-structure correlations in proteins.Journal of Molecular Biology 2000;301:173-190
    89.Krogh A.Computationial Methods in Molecular Biology:Chapter 4 An Introduction to Hidden Markov Models for Biological Sequences;1998:45-63
    90.Gough J,Karplus K,Hughey R,Chothia C.Assignment of homology to genome sequences using a library of hidden Markov models that represent all proteins of known structure.Journal of Molecular Biology 2001;313:903-919
    91.Nicolas P,Bize L,Muri F,et al.Mining Bacillus subtilis chromosome heterogeneities using hidden Markov models.Nucleic Acids Research 2002;30:1418-1426
    92.Ellrott K,Yang C,Sladek FM,Jiang T.Identifying transcription factor binding sites through Markov chain optimization.Bioinformatics;l 8:100-109
    93.Bejerano G.Algorithms for variable length Markov chain modeling.Bioinformatics 2004;20:788-789
    94.Ji X,Li-Ling J,Sun Z.Mining gene expression data using a novel approach based on hidden Markov models.In:Elsevier;2003:125-131
    95.Siepel A,Haussler D.Combining phylogenetic and hidden Markov models in biosequence analysis.In,Proceedings of the seventh annual international conference on Research in computational molecular biology:ACM Press New York,NY,USA;2003:277-286
    96.Peshkin L,S.Gelfand M.Segmentation of yeast DNA using hidden Markov models.Bioinformatics 1999;15:980-986
    97.Boys RJ,Henderson DA,Wilkinson DJ.Detecting homogeneous segments in DNA sequences by using hidden Markov models.Applied Statistics 2000;49:269-285
    98.Karchin R,Cline M,Mandel-Gutfreund Y,Karplus K.Hidden Markov models that use predicted local structure for fold recognition:alphabets of backbone geometry.Proteins 2003;51:504-514
    99.Camproux AC,Gautier R,Tuffery P.A Hidden Markov Model Derived Structural Alphabet for Proteins.Journal of Molecular Biology 2004;339:591-605
    100.Husmeier D.Sensitivity and specificity of inferring genetic regulatory interactions from microarray experiments with dynamic Bayesian networks.Bioinformatics 2003;19:2271-2282
    101.Zou M,Conzen SD.A new dynamic Bayesian network(DBN)approach for identifying gene regulatory networks from time course microarray data.Bioinformatics 2005;21:71-79
    102.Kim S,Imoto S,Miyano S.Dynamic Bayesian network and nonparametric regression for nonlinear modeling of gene networks from time series gene expression data.BioSystems 2004;75:57-65
    103.Yu J,Smith VA,Wang PP,Hartemink AJ,Jarvis ED.Advances to Bayesian network inference for generating causal networks from observational biological data.Bioinformatics 2004;20:3594-3603
    104.Perrin BE,Ralaivola L,Mazurie A,et al.Gene networks inference using dynamic Bayesian networks.Bioinformatics 2003;19:138-148
    105.Kim SY,Imoto S,Miyano S.Inferring gene networks from time series microarray data using dynamic Bayesian networks.Bioinformatics 2003;4:228-235
    106.Dojer N,Gambin A,Mizera A,Wilczynski B,Tiuryn J.Applying dynamic Bayesian networks to perturbed gene expression data.BMC Bioinformatics 2006;7:249
    107.van Berlo RJP,van Someren EP,Reinders MJT.Studying the Conditions for Learning Dynamic Bayesian Networks to Discover Genetic Regulatory Networks.SIMULATION,2003;79:689
    108.Kim SY,Imoto S,Miyano S.Dynamic Bayesian Network and Nonparametric Regression Model for Inferring Gene Networks.Genome Informatics 2002;13:371-372
    109.Rice JJ,Tu Y,Stolovitzky G.Reconstructing biological networks using conditional correlation analysis.Bioinformatics 2005;21:765-773
    110.Barber D.Probabilistic Modelling and Reasoning Dynamic Bayesian Networks:Discrete Hidden Variables.In:http://www.anc.ed.ac.uk/~dbarber/pmr/pmr_2003_dynamic_discrete.pdf;2003
    111.Zhang Y,Deng Z,Jiang H,Jia P.Dynamic Bayesian Network(DBN)with Structure Expectation Maximization(SEM)for Modeling of Gene Network from Time Series Gene Expression Data.In:http://ww1.ucmss.com/books/LFS/CSREA2006/BIC4650.pdf
    112.Le Y,Stephen M.Inferring context-sensitive Probabilistic Boolean Networks from gene expression data under multi-biological conditions.BMC Systems Biology 2007;1:P63
    113.Shmulevich I,Dougherty ER,Kim S,Zhang W.Probabilistic Boolean networks:a rule-based uncertainty model for gene regulatory networks.Bioinformatics 2002;18:261-274
    114.Shmulevich I,Dougherty ER,Zhang W.Gene perturbation and intervention in probabilistic Boolean networks.Bioinformatics 2002;18:1319-1331
    115.Pal R,Datta A,Bittner ML,Dougherty ER.Intervention in context-sensitive probabilistic Boolean networks.Bioinformatics 2005;21:1211-1218
    116.Shmulevich I,Dougherty ER,Zhang W.From Boolean to probabilistic Boolean networks as models of genetic regulatory networks.Bioinformatics 2002;90:1778-1792
    117.Shmulevich I,Gluhovsky I,Hashimoto RF,Dougherty ER,Zhang W.Steady-state analysis of genetic regulatory networks modelled by probabilistic Boolean networks.Comparative and Functional Genomics 2003;4:601-608
    118.Hashimoto RF,Kim S,Shmulevich I,et al.Growing genetic regulatory networks from seed genes.Bioinformatics 2004;20:1241-1247
    119.Pal R,Ivanov I,Datta A,Bittner ML,Dougherty ER.Generating Boolean networks with a prescribed attractor structure.Bioinformatics 2005;21:4021-4025
    120.Brun M,Dougherty ER,Shmulevich I.Steady-state probabilities for attractors in probabilistic Boolean networks.Signal Processing 2005;85:1993-2013
    121.Ivanov I,Dougherty ER.Reduction Mappings between Probabilistic Boolean Networks.EURASIP Journal on Applied Signal Processing,2004 2004;2004:125-131
    122.Pal R,Datta A,Dougherty ER.Optimal Infinite-Horizon Control for Probabilistic Boolean Networks.IEEE Transactions on Signal Processing 2006;54:2375-2387
    123.Dougherty ER,Xiao Y.Design of probabilistic Boolean networks under the requirement of contextual data consistency.IEEE Transactions on Signal Processing 2006;54:3603-3613
    124.Marshall S,Xiao Y,Dougherty ER.Inference of a probabilistic Boolean network from a single observed temporal sequence.EURASIP Journal on Bioinformatics and Systems Biology 2007;2007:5-5
    125.Ranadip P,Aniruddha D,Bittner Michael L,Dougherty Edward R.Intervention in context-sensitive probabilistic Boolean networks.Bioinformatics 2005;21:1211-1218
    126.Marshall S,Yu L,Xiao Y,Dougherty ER.Temporal inference of probabilistic Boolean networks.Genomic Signal Processing and Statistics 2006:71-72
    127.Ching WK,Zhang SQ,Jiao Y,Akutsu T,Wong AS.Optimal Finite-Horizon Control for Probabilistic Boolean Networks with Hard Constraints.http://hkumathhkuhk/~imr/lMRPreprintSeries/2007/IMR2007-23.pdf
    128.Ching WK,Zhang S,Ng MK,Akutsu T.An approximation method for solving the steady-state probability distribution of probabilistic Boolean networks.Bioinformatics 2007;23:1511
    129.Pal R,Datta A,Bittner ML,Dougherty ER.External control in a special class of probabilistic boolean networks.In,American Control Conference,2005;2005:411-416
    130.Yeung KY,Haynor DR,Ruzzo WL.Validating clustering for gene expression data.Bioinformatics 2001;17:309-318
    131.Hanisch D,Zien A,Zimmer R,Lengauer T.Co-clustering of biological networks and gene expression data.Bioinformatics 2002;18:145-154
    132.Xing EP,Karp RM.CLIFF:clustering of high-dimensional microarray data via iterative feature filtering using normalized cuts.Bioinformatics;17:306-315
    133.Gat-Viks I,Sharan R,Shamir R.Scoring clustering solutions by their biological relevance.Bioinformatics 2003;19:2381-2389
    134.D'Haeseleer P,Liang S,Somogyi R.Genetic network inference:from co-expression clustering to reverse engineering.Bioinformatics 2000;16:707-726
    135.Adryan B,Schuh R.Gene-Ontology-based clustering of gene expression data.Bioinformatics 2004;20:2851-2852
    136.Tamames J,Clark D,Herrero J,et al.Bioinformatics methods for the analysis of expression arrays:data clustering and information extraction.Journal of Biotechnology 2002;98:269-283
    137.Bar-Joseph Z,Demaine ED,Gifford DK,et al.K-ary clustering with optimal leaf ordering for gene expression data.Bioinformatics 2003;19:1070-1078
    138.King AD,Przuij N,Jurisica I.Protein complex prediction via cost-based clustering.Bioinformatics 2004;20:3013-3020
    139.Iliopoulos I,Enright AJ,Ouzounis CA.Textquest:document clustering of Medline abstracts for concept discovery in molecular biology.Pac Symp Biocomput 2001;6:384-395
    140.Huang D,Pan W.Incorporating biological knowledge into distance-based clustering analysis of microarray gene expression data.Bioinformatics 2006;22:1259-1268
    141.Eisen MB,Spellman PT,Brown PO,Botstein D.Cluster analysis and display of genome-wide expression patterns.Genetics 1998;95:14863-14868
    142.Sherlock G.Analysis of large-scale gene expression data.Brief Bioinformatics 2001;2:350-362
    143.Toronen P,Kolehmainen M,Wong G,Castren E.Analysis of gene expression data using self-organizing maps.FEBS Lett 1999;451:142-146
    144.Herrero J,Valencia A,Dopazo J.A hierarchical unsupervised growing neural network for clustering gene expression patterns.Bioinformatics 2001;17:126-136
    145.Weaver DC,Workman CT,Stormo GD.Modeling regulatory networks with weight matrices.In,Pacific Symposium on Biocomputing;1999:112-123
    146.Reinitz J,Sharp DH.Mechanism of eve stripe formation.Mech Dev 1995;49:133-158
    147.Ernst J,Nau GJ,Bar-Joseph Z.Clustering short time series gene expression data.Bioinformatics 2005;21:S159-S168
    148.Akutsu T,Miyano S,Kuhara S.Identification of genetic networks from a small number of gene expression patterns under the Boolean network model.Pac Symp Biocomput 1999;17:28
    149. Kauffman SA. Metabolic stability and epigenesis in randomly constructed genetic nets. J Theor Biol 1969;22:437-467
    150. Kauffman S. Gene regulation networks: a theory for their global structure and behaviors. Current Topics in Developmental Biology 1971 ;6:145-182
    151. Kauffman S. The large scale structure and dynamics of gene control circuits: an ensemble approach. J Theor Biol 1974;44:167-190
    152. Akutsu T, Miyano S, Kuhara S. Algorithms for Identifying Boolean Networks and Related Biological Networks Based on Matrix Multiplication and Fingerprint Function. Journal of Computational Biology 2000;7:331-343
    153. Somogyi R, Sniegoski CA. Modeling the complexity of genetic networks: understanding multigenic and pleiotropic regulation. Complexity, 1996;1:45-63
    154. Silvescu A, Honavar V. Temporal Boolean Network Models of Genetic Networks and Their Inference from Gene Expression Time Series. Complex Systems 2001;13:54-70
    155. Liang S, Fuhrman S, Somogyi R. REVEAL, a general reverse engineering algorithm for inference of genetic network architectures. Pacific Symposium on Biocomputing 1998;3:22
    156. Friedman N, Linial M, Nachman I, Pe'er D. Using Bayesian Networks to Analyze Expression Data. Journal of Computational Biology 2000;7:601-620
    157. Pearl J. Probabilistic Reasoning in Intelligent Systems: Networks of Plausible Inference: Morgan Kaufmann; 1988
    158. Imoto S, Goto T, Miyano S. ESTIMATION OF GENETIC NETWORKS AND FUNCTIONAL STRUCTURES BETWEEN GENES BY USING BAYESIAN NETWORKS AND NONPARAMETRIC REGRESSION. In, Pacific Symposium on Biocomputing 2002: World Scientific; 2001
    159. Imoto S, Higuchi T, Goto T, et al. Combining microarrays and biological knowledge for estimating gene networks via Bayesian networks. In, Bioinformatics Conference; 2003:104-113
    160. Imoto S, Sunyong K, Goto T, et al. Bayesian network and nonparametric heteroscedastic regression for nonlinear modeling of genetic network. In, Bioinformatics Conference; 2002:219-227
    161. Zhang SQ, Ching WK, Ng MK, Akutsu T. Simulation study in Probabilistic Boolean Network models for genetic regulatory networks. International Journal of Data Mining and Bioinformatics 2007; 1:217-240
    162. Gransson L, Koski T. Using a Dynamic Bayesian Network to Learn Genetic Interactions. In: Technical Report; 2002
    163. Rost B. Review: Protein Secondary Structure Prediction Continues to Rise. Journal of Structural Biology 2001; 134:204-218
    164. Rost B, Sander C. Combining evolutionary information and neural networks to predict protein secondary structure. Proteins: Struct Funct Genet 1994; 19:55-72
    165. References S, Rost B, Sander C. Improved prediction of protein secondary structure by use of sequence profiles and neural networks. Proc Natl Acad Sci US A 1993;90:7558-7562
    166. Kneller DG, Cohen FE, Langridge R. Improvements in protein secondary structure prediction by an enhanced neural network. J Mol Biol 1990;214:171-182
    167. Holley LH, Karplus M. Protein secondary structure prediction with a neural network. In: National Academy of Sciences; 1989:152
    168. Kim H, Park H. Protein secondary structure prediction based on an improved support vector machines approach. Bioinformatics 2003;16:553-560
    169. Cai YD, Liu XJ, Xu X, Zhou GP. Support Vector Machines for predicting protein structural class. BMC Bioinformatics 2004; http://www.biomedcentral.com/content/pdf/1471-2105-2-3.pdf
    170. Agrawal R, Imielinski T, Swami A. Mining association rules between sets of items in large databases. In, Proceeding of the 1993 ACM SIGMOD International Conference on Management of Data: ACM Press New York, NY, USA; 1993:207-216
    171. Pan F, Cong G, Tung AKH, Yang J, Zaki MJ. Carpenter: finding closed patterns in long biological datasets. In, Proceedings of the ninth ACM SIGKDD international conference: ACM Press New York, NY, USA; 2003:637-642
    172. Gao Cong AKHT, Xin Xu, Feng Pan, Jiong Yang. FARMER: Finding Interesting Rule Groups in Microarray Datasets. In, SIGMOD Conference 2004: 143-154
    173. YuQing Miao GC, Bin Song, ZhiHao Wang. TP+Close: Mining Frequent Closed Patterns in Gene Expression Datasets. In, VDMB 2006: 120-130
    174. Creighton C, Hanash S. Mining gene expression databases for association rules. Bioinformatics 2003;i9:79-86
    175. Berrar.D DI, Granzow.M, et al. Analysis of Gene Expression and Drug Activity Data by Knowledge-based Association Mining. In, Proceedings of Critical Assessment of Microarray Data Analysis Techniques(CAMDA'01); 2001:25-28.
    176. Agrawal R SR. Fast algorithms for mining association rules. In, VLDB94; 1994.487-499.
    177. DassowGvon ME, MunroEm OGM. The segment polarity network is a robust development module. Nature 2000;406:188-192
    178. GP.Shapiro. Knowledge discovery in real database:A report on the IJCAI-89 Workshop. AI Magazine 1991;11:68-70
    179. Han J, Kamber M. Data Mining: Concepts and Techniques: Morgan Kaufmann; 2006
    180. Chou.P.A. Optimal Partitioning for Classification and Regression Tress. IEEE Transactions on Pattern Analysis and Machine Intelligence 1991;13:340-354
    181. Murthy.S.K. Automatic Construction of Decision Trees from Data:A Multi-Disciplinary Survey. Data Mining and Knowledge Discovery 1998;2:345-389
    182. Bradley PS, Fayyad U, Reina C. Scaling clustering algorithms to large databases. In, Knowledge Discovery and Data Mining: AAAI Press; 1998:9-15
    183. Wei-ning Q, Ao-ying Z. Analyzing Popular Clustering Algorithms from Different Viewpoints. Journal of Software 2002;13:1382-1394
    184. Yang J, Wang W, Yu PS. Mining Surprising Periodic Patterns. Data Mining and Knowledge Discovery 2004;9:189-216
    185. Han J, Dong G, Yin Y. Efficient mining of partial periodic patterns in time series database. In, ICDE; 1999:106-115
    186. Li Y, Ning P, Wang XS, Jajodia S. Discovering calendar-based temporal association rules. Data & Knowledge Engineering 2003 ;44:193-218
    187. Srikant R, Agrawal R. Mining quantitative association rules in large relational tables. In, Proceedings of the 1996 ACM SIGMOD international conference: ACM Press New York, NY, USA; 1996:1-12
    188. Hong TP, Kuo CS, Chi SC. Mining association rules from quantitative data. Intelligent Data Analysis 1999;3:363-376
    189. Srikant R, Agrawal R. Mining generalized association rules. FUTURE GENER COMPUT SYST 1997;13:161-180
    190. Agrawal R, Srikant R. Fast Algorithms for Mining Association Rules in Large Databases. IEEE Transactions on Knowledge and Data Enginerr 1994:487-499
    191. Kamber M, Han J, Chiang JY. Metarule-guided mining of multi-dimensional association rules using data cubes. In, KDD 97; 1997:207-210
    192. Srikant R, Vu Q, Agrawal R. Mining association rules with item constraints. In, KDD 97; 1997:67-73
    193. Bayardo Jr RJ, Agrawal R. Mining the most interesting rules. In, International conference on Knowledge discovery and data mining: ACM Press New York, NY, USA; 1999:145-154
    194. Wang K, He Y, Han J. Mining Frequent Itemsets Using Support Constraints. In, Proceedings of the 26th International Conference on VLDB; 2000:43-52
    195. Cheung YL, Fu AWC. Mining frequent itemsets without support threshold: with and without item constraints. In, IEEE Transactions on Knowledge and Data Engineering; 2004:1052-1069
    196. Chan KCC, Au WH. Mining fuzzy association rules. In, Proceedings of the sixth international conference on CIKM: ACM Press New York, NY, USA; 1997:209-215
    197. Luo J, Bridges SM. Mining fuzzy association rules and fuzzy frequency episodes for intrusion detection, nternational Journal of Intelligent Systems 2000;15:687-703
    198. Wang W, Yang J, Yu PS. Efficient mining of weighted association rules (WAR): ACM Press New York, NY, USA; 2000
    199. Cai CH, Fu AWC, Cheng CH, Kwong WW. Mining association rules with weighted items. In, Database Engineering and Applications Symposium; 1998:68-77
    200. Evfimievski A, Srikant R, Agrawal R, Gehrke J. Privacy preserving mining of association rules. Information Systems 2004;29:343-364
    201. Vaidya J, Clifton C. Privacy Preserving Association Rule Mining in Vertically Partitioned Data. In, ACM SIGKDD 2002: p.1-11
    202. Cheung DW, Han J, Ng V, Wong CY. Maintenance of discovered association rules in large databases: An incremental updating technique. In: IEEE Computer Society Washington, DC, USA; 1926:106-114
    203. Zhang S, Zhang C, Yan X. Post-mining: maintenance of association rules by weighting. Information Systems 2003;28:691-707
    204. Mueller A. Fast Sequential and Parallel Algorithms for Association Rule Mining: A Comparison. In: research directed by Dept. of Computer Science.University of Maryland at College Park; 1995
    205. Agrawal R, Shafer JC, Center I, San Jose CA. Parallel mining of association rules. IEEE Transactions on Knowledge and Data Engineering 1996;8:962-969
    206. Ozden B, Ramaswamy S, Silberschatz A. Cyclic association rules. In, In Proc Of the 14th Intl Conf on Data Eng; 1998:412-421
    207. Brin S, Motwani R, Silverstein C. Beyond market baskets: generalizing association rules to correlations. In, ACM SIGMOD IntlConfManagement of Data: ACM Press New York, NY, USA; 1997:265-276
    208. Savasere A, Omiecinski E, Navathe S. Mining for strong negative associations in a large database of customer transactions. In, the 14th Intl Conf on Data Engineering: IEEE Computer Society Washington, DC, USA; 1998:494-502
    209. Tan PN, Kumar V, Srivastava J, Army High Performance Computing Research C, University of M. Indirect Association: Mining Higher Order Dependencies in Data: Springer; 2001
    210. Hamano S, Sato M. Mining Indirect Association Rules. In, 4th Industrial Conference on Data Mining: Springer; 2004:106-116
    211. Suzuki E. Autonomous discovery of reliable exception rules. In, KDD97; 1997:259-262
    212. Aggarwal CC, Yu PS. Online generation of association rules. In, ICDE98; 1998:402-411
    213. Relue R, Wu X, Huang H. Efficient runtime generation of association rules. In, CIKM'01: ACM Press New York, NY, USA; 2001:466-473
    214. Bastide Y, Pasquier N, Taouil R, Stumme G, Lakhal L. Mining minimal non-redundant association rules using frequent closed itemsets. Computational Logic 2000:972-986
    215. Zaki MJ. Generating non-redundant association rules. In, International conference of Knowledge discovery and data mining: ACM Press New York, NY, USA; 2000:34-43
    216. AgTawal R, Srikant R. Mining sequential patterns. In, ICDE95; 1995:3-14
    217. Srikant R, Agrawal R. Mining Sequential Patterns: Generalizations and Performance Improvements. In, EDBT96: Springer; 1996
    218. Pei J, Han J, Mortazavi-Asl B, et al. PrefixSpan: Mining Sequential Patterns Efficiently by Prefix-Projected Pattern. In, IEEE conference on Data Engineer; 2001
    219. Pei J, Han J, Wang W. Mining sequential patterns with constraints in large databases. In, Conference on Information and Knowledge Management: ACM Press New York, NY, USA; 2002:18-25
    220. Bahamish HAA, Salam RA, Abdullah R, Osman MA, Rashid NA. Mining protein data using parallel/distributed association rules. In, Information and Communication Technologies: From Theory to Applications; 2004:461-462
    221. Besemann C, Denton A, Yekkirala A, Hutchison R, Anderson M. Differential association rule mining for the study of protein-protein interaction networks. In, th Workshop on Data Mining in Bioinformatics; 2004
    222. Frank E, Hall M, Trigg L, Holmes G, Witten IH. Data mining in bioinformatics using Weka. Bioinformatics 2004;20:2479-2481
    223. Kam HJ, Lee D, Lee KH. Mining and interpretation of association rules among protein sequence motifs. Engineering in Medicine and Biology Society 2003;4
    224. Kim GH, Kim MH, Her JH, et al. Efficient Fault Tolerant Apriori Algorithm for Discovering Association Rules among Local Protein Structures. In
    225. Murali TM, Kasif S. EXTRACTING CONSERVED GENE EXPRESSION MOTIFS FROM GENE EXPRESSION DATA. Biocomputing 2002
    226. Oyama T, Kitano K, Satou K, Ito T. Extraction of knowledge on protein-protein interaction by association rule discovery. Bioinformatics 2002;18:705-714
    227. Pfaltz JL, Taylor CM. Closed set mining of biological data. In, Workshop on Data Mining in Bioinformatics; 2002
    228. Rahal I, Ren D, Perera A, et al. Incremental interactive mining of constrained association rules from biological annotation data with nominal features. In: ACM Press New York, NY, USA; 2005:123-127
    229. Wang HC, Lee YS. Gene Network Prediction from Microarray Data by Association Rule and Dynamic Bayesian Network. In, Computational Science and Its Applications: Springer; 2005
    230. Zaki MJ, Morishita S, Rigoutsos I. Report on BIOKDD04: workshop on data mining in Bioinformatics. In, ACM SIGKDD Explorations: ACM Press New York, NY, USA; 2004:153-154
    231. Zaki MJ, Wang JTL, Toivonen HTT. BIOKDD 2002: recent advances in data mining for bioinformatics.In,ACM SIGKDD Explorations.ACM Press New York,NY,USA;2002:112-114
    232.Ordonez C,Omiecinski E,de Braal L,et al.Mining Constrained Association Rules to Predict Heart Disease.In,IEEE International Conf on Data Mining;2001:433-440
    233.Metwally A,Agrawal D,El Abbadi A.Using association rules for fraud detection in web advertising networks.In,VLDB:VLDB Endowment;2005:169-180
    234.Dass R,Mahanti A.An Efficient Technique for Frequent Pattern Mininig in Real-Time Business Applications.In,8th IEEE Hawaii International Conference on System Sciences;2005
    235.Ahonen-Myka H.Mining all maximal frequent word sequences in a set of sentences.In,CIKM05:ACM Press New York,NY,USA;2005:255-256
    236.Liu B,Hsu W,Ma Y.Integrating classification and association rule mining.In,Conference on Knowledge Discovery and Data Mining;1998:80-86
    237.Ozgur A,Tan PN,Kumar V.RBA:An Integrated Framework for Regression Based on Association Rules.In,Intl Conf on Data Mining;2004
    238.Han EH.Clustering Based on Association Rule Hypergraphs:University of Minnesota,Dept.of Computer Science;1997
    239.Rashidi HH,Buehler LK.Bioinformatics Basics:Applications in Biological Science and Medicine:CRC Press;2000
    240.Anfinsen CB,Haber E,Sela M,White FH.The Kinetics of Formation of Native Ribonuclease during Oxidation of the Reduced Polypeptide Chain.In,Proc of the National Academy of Science:JSTOR;1961:1309-1314
    241.Newman DJ,Hettich,S.Blake,C.L.Merz,C.J..UCI Repository of machine learning databases.In,[http://wwwicsuciedu/~mlearn/MLRepositoryhtml]:.Irvine,CA:University of California,Department of Information and Computer Science.1998
    242.http://restools.sdsc.edu/boiotools/biotools9.html.
    243.http://swift.cmbi.ru.nl/gv/dssp/.
    244.http://scop.mrc-lmb.cam.ac.uk/scop/.
    245.S.H.White REJ.The Evolution of Proteins from Random Amino Acid Sequences-I Evidence of Proteins from the Lengthwise Distribution of Amino Acids in Modern Proteins.J Mol Evol 1993;36:79-95
    246.R Srikant RA.Mining Quantitative Association Rules in Large Relational Tables.In,the ACMSIGMOD Conference on Management of Data,Montreal,Canada 1996:1-12.
    247.Eisen.MB,Microarray and Related Data for Analyses.http://rana.Ibl.gov/EisenData.htm.
    248.Carmona-Saez P,Chagoyen M,Rodriguez A,et al.Integrated analysis of gene expression by association rules discovery.BMC Bioinformatics 2006;7:1-16.
    249.Cong G,Tan KL,Tung AKH,Pan F.Mining Frequent Closed Patterns in Microarray Data.In,In IEEE International Conference on Data Mining,(ICDM);2004:143-154
    250.Pan F,Tung AKH,Cong G,Xu X.COBBLER:combining column and row enumeration for closed pattern discovery.In,In Proc of the 16th Int Conf on Scientific and Statistical Database Management;2004:21-30
    251.Dassow G ME,Munro E M,et al.The segment polarity network is a robust development module.Nature 2000;406:188-192
    252.Lander ES.Array of hope.Nature Genetics 1999.2;supplement volume 21:3-4
    253.Ramaswamy S,Golub TR.DNA Microarrays in Clinical Oncology.Journal of Clinical Oncology 2002;20:1932-1939
    254.Khan J,Wei JS,Ringner M,et al.Classification and diagnostic prediction of cancers using gene expression profiling and artificial neural networks.Nature Medicine 2001;7:673-679
    255.Singh D,Febbo PG,Ross K,et al.Gene expression correlates of clinical prostate cancer behavior.Cancer Cell 2002;1:203-209
    256.Golub TR,Slonim DK,Tamayo P,et al.Molecular Classification of Cancer:Class Discovery and Class Prediction by Gene Expression Monitoring.Science 1999;286:531-537
    257.http://www.broad.mit.edu/cgi-bin/cancer/datasets.cgi.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700