文本数据的生物信息学模型及在前列腺癌中的应用研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
大量的生物文本为生物医学研究提供了丰富的资源。但由于文本数量巨大,无法通过人工处理来获取信息。文本挖掘能从现有的文献中自动地挖掘感兴趣的信息。借助于文本挖掘,可以从文献数据库检索需要的生物医学文本;这些非结构化的文本包含了大量的研究成果和实验数据,文本挖掘可以找出其中蕴含的重要信息和知识;在所发现的信息的基础上,研究人员可以进一步生成假设、进行推断和预测、指导实验和更深入的研究。
     癌症已经成为影响人类健康的主要恶性疾病之一,对癌症的预防、诊断和治疗是一个重要的研究热点。以文本形式存在的大量癌症相关的文献和实验数据为生物医学研究提供了宝贵的资料。许多研究者借助于文本挖掘在处理文本数据方面的优势,将癌症研究和文本挖掘相结合,使用文本挖掘技术发现新的知识,以促进生物医学的深入研究。
     本文综述了文本挖掘的各项子任务、一般处理流程、常用数据集和工具,介绍了目前文本挖掘在癌症中的应用研究,具体包括:
     1)介绍了文本挖掘的相关概念、各项子任务以及处理流程;
     2)细述了一些常用的文本挖掘工具和语料,分析比较了这些工具的优缺点和适用领域;
     3)分析总结了基于文本挖掘的癌症系统生物学研究常规流程;
     4)指出了文本挖掘存在的不足,面临的挑战,并针对性地提出了解决思路以供研究者参考。
     要从这些海量数据中挖掘信息,找出文本中包含的生物词汇是关键。命名实体识别的目的是从文本中识别出指定类型的实体名字,如基因、蛋白等。命名体识别是进一步挖掘信息的基础。从计算建模的角度来看,生物命名体识别可以看成是一个序列分割问题,得到了广泛的研究。然而,由于生物词汇在词汇构造、语法、词形、语义、上下文等多方面均有特殊性,因此,很多通用的命名体识别系统在识别生物词汇时表现不佳。
     在机器学习的方法中,支持向量机(Support Vector Machine,SVM)在解决小规模的、非线性的、高维的问题时表现较好。SVM在关系提取、关系预测和模式识别等方面有着大量的应用。机器学习的另一种方法条件随机场(Conditional RandomField,CRF)是为了解决最大熵马尔可夫模型中存在的标注偏置问题而引入的,是一种连续的优化最大熵模型。CRF擅长于解决序列标记问题。然而,在实际应用中,SVM和CRF具有很多不足和限制条件。SVM最初只适用于二分类问题而CRF可以用于多分类问题;虽然CRF通常需要较多的计算时间和空间,但很适用于解决序列数据的标记问题,具有较高的稳定性。通过分析发现,SVM和CRF具有一定的互补性,结合二者能相互促进,获得更好的结果。
     本文将生物命名体识别视为一个包含多个步骤的任务。首选确定候选的单词是否为一个生物单词;由于这个过程是一个二分类问题,因此可以使用SVM来很好地完成。如果判断的结果是一个生物单词,则再使用CRF来确定这个单词属于哪种类别。然后将SVM和CRF的结果进行合并,最后,利用一系列的算法进行修正。具体包括:
     1)根据生物单词的特性,使用所提出的2条规则,找出由于上下文的不同而造
     成不一致;
     2)提出了1条规则用于保证找出的词汇包含尽可能多的生物单词,并在此基础上提出了词汇长度最大化算法,以确保得到最完整的生物词汇;
     3)针对SVM和CRF结合后可能出现的结果不一致现象,提出了最大双向概率的方法以分析结果。双向概率包括了向前概率和向后概率两部分。向前概率给出了在前一个状态的基础上向前输出的各种情况的概率;向后概率给出了在后一个状态的基础上向后输出的各种情况的概率。本文取二者结合的最大值所对应的状态作为结果。
     本文在GENIA数据集和JNLPBA04数据集上分别进行了测试。多个评价指标均表明,结合SVM和CRF可以获得更好的效果。本文所提方法的基本思想是充分发挥CRF模型的稳定性并利用SVM长于二分类问题以改进CRF的效果。然而,由于SVM和CRF是两种不同的方法,简单将其组合起来使用可能会造成标记结果的不一致性。通过修正之后,可以改善该问题,从而在保证识别稳定性的前提下提高了识别效果。
     随着对生物问题研究的深入,人们逐渐认识到,复杂的生物功能和生命现象,是各种生物基本组成单位之间复杂相互作用的结果,不能简单地归结为生物分子个体的结构和功能。深入研究各种生物分子的相互作用网络从而理解生命功能是如何通过各种生物分子的相互作用实现的,是现代生物学的一个主要内容。
     强化学习方法是一种机器学习的方法。在强化学习的框架下构建作用网络,具有多个优势:
     1)作为一个复杂的疾病,癌症的生物分子作用网络是无标度的。使用强化学习的方法,agent反复尝试作用结对交互,奖赏和回报决定了哪些交互被强化,网络结构作为agent学习行为的动态性的结果出现。网络本身所具备的无标度特性会被保留。
     2)生物问题一个特点是具有未知性。癌症作为一种系统的、复杂的疾病,其中的一些机制还未被人们所了解。强化学习提出了一个未知随机环境中学习最佳行为的问题。使用强化学习的方法,保证网络收敛到一个最佳的稳定状态。
     3)使用具有开放性的强化学习方法,在建立网络的过程中无缝地与生物知识和生物数据结合。来自多个源的生物数据可以被用于构建网络,各种数据互为利用、取长补短,因此所建立的网络具有更高的可信度。使用强化学习框架,强化生物事实,而非随机构建网络,确保网络符合生物复杂网络的基本特性。
     在作用网络的环境下,如果单纯考量两个生物实体的单个作用关系而不考虑其他生物实体的影响,是不适合的。本文给出了一个综合影响的概念,用以衡量节点的交互的上下文环境中和网络环境下的相互影响力。综合影响包括了两个生物实体之间直接相互作用而产生的直接影响以及通过其他生物实体间接发生的间接影响。分析表明,综合影响更适合作用网络的环境。本文认为,综合影响越大,两个生物实体之间的相互作用就越强,该作用出现的概率也越高。基于生物网络的非随机性,本文提出了基于综合影响的网络熵的概念和相关计算方法来衡量网络信息流分布不规则性,以分析网络演化过程中的稳定情况。由于最终形成的作用网络并非随机网络而是具有稳定的拓扑结构,因此,所建立的作用网络的网络熵越小越好。
     本文采纳了强化学习的思想,在行动者-评论家算法框架下,提出了一种构建相互作用网络的算法。该算法以节点表示生物分子,边表示生物分子之间的作用。在网络演化的过程中,一个节点选择连接网络中的其他节点,代表一个生物分子选择网络中其他候选生物分子与之交互。每个生物分子在不同阶段都有不同的决策,并得到对应的网络熵。算法以当前网络状态下所有节点的平均奖赏作为判断条件,反复进行选择,不断迭代,最终演化形成一个最佳的网络。该网络是作为学习行为动态性的结果出现的。
     前列腺癌是最主要的高发性恶性肿瘤之一,也始终是生物研究者的关注热点。本文在PubMed的文本数据集上,利用所提出的方法,建立了一个前列腺癌的蛋白质相互作用网络,结果显示,本文方法所得到的效果较好。网络拓扑结构分析的结果也表明,本文方法所建立的网络的节点度分布符合无标度特性。
     最后,在本文主要方法的基础上,开发了一个生物文本挖掘系统。该系统包含文本检索、大规模文本自动下载、生物词汇识别、基于文本数据的生物相互作用网络构建以及网络可视化等主要功能。
Many biomedical texts provide a wealth of resources for biomedical researchers.However, it is impossible for people to manually process this gigantic amount of texts.Meanwhile, text mining can help researchers to explore information of interest fromexisting texts. Through text mining, required biomedical texts can be retrieved fromliterature databases; text mining can extract important information and knowledge fromthese unstructured texts which contain numerous research results and experiment data; textmining can also help to generate hypothesis and carry on prediction which can be used forfurther research work.
     Cancer is one of the worst diseases that influence human health. The research oncancer prevention, diagnosis, and treatment is one of the hotspots of biomedical researchareas. As it is well known, biomedical research relies heavily on existing material. Thereare a lot of cancer-related literature and experimental data, while text mining has anadvantage of information processing. Therefore, many researchers have begun to combinecancer research with text mining to discover new knowledge and promote biomedicalresearch.
     In this dissertation, we review the sub-tasks of text mining, the general processes,commonly used data sets and tools, and show some current text mining applications incancer research. We also analyze and summarize text mining-based cancer systems biologyresearch routine process, and point out limitations of text mining, as well as challenges andsolutions.
     To get information by text mining from massive data, it is essential to find outbiological terms from the texts. Named entity recognition is aimed to identify predefinedtypes of entity names, such as genes and proteins. However, since many factors ofbiological texts such as term structure, grammar, morphology, semantics, and context arenot the same as general texts, many recognition systems failed to identify the terms frombiological texts.
     SVM (Support Vector Machine) does well in small-scale, non-linear,high-dimensional pattern recognition and other machine learning problems. CRF(Conditional Random Fields) is good at solving sequence tagging problems. However, bothof them have many limitations and drawbacks. As they are complementary to some extent,combining these two methods together can promote performance.
     In this dissertation, we propose a series of algorithms to detect biological terms fromtexts. The algorithm uses SVM to determine whether a term is a biological one and thenutilize CRF to decide the type of biological words. After merging the results returned bySVM and CRF, an algorithm will be responsible for the correction which uses maximalbi-direction probability to remove inconsistency and ensure the maximal length of theterm.
     The test results on GENIA datasets and JNLPBA04datasets show that our proposedmethod yields better results. The basic idea of the proposed method is taking full advantageof SVM to improve the effect of CRF. However, since the SVM and CRF are two differentmethods, simply combining them together may cause inconsistency. By amendmentalgorithms, the inconsistency problem can be resolved, thereby enhancing the recognitioneffect.
     With the proceeding of biological research, people have gradually realized thatcomplex biological functions and the phenomenon of life are the results of complexinteractions among a variety of biological basic units. Deeply studying bio-molecularinteraction network to understand life through a variety of bio-molecular interactions is anelement of modern biology.
     In the network environment, it is unsuitable to only consider the single interactionbetween the two biological entities. Hereby, in this dissertation, we propose acomprehensive impact concept to measure interactions in the network context.Comprehensive influence includes the direct interaction between the two nodes thatrepresent two entities and indirect interaction between them. The results show that thecomprehensive effect is more suitable for the network environment. We believe that thegreater the influence results in stronger force between two biological entities, as well ashigher probability of occurrence. As most biological networks are not random networks,we put forward a network entropy evaluation method which is based on comprehensiveinfluence to measure the irregularities of network flow distribution in order to analyze thestability of the network during evolution. As the final network after iterations istopologically different from a randomly built network, the network that has the lessnetworks entropy, which indicates more stable, will be better.
     In this dissertation, we, adopting reinforcement learning idea, put forward analgorithm for interaction network forming which takes advantage of actors-critic algorithmframework. With the algorithm, nodes are used to represent bio-molecules and edgesdenote interactions. During the evolutionary process, a node selects with which nodes in the network it tends to interact. Different decisions will result in different network entropyvalues. The average network entropy will be used to evaluate the current state. Keepselecting and carrying on iteration, until eventually forming an optimal network. Thenetwork is the result of the dynamic nature of learning behavior.
     Prostate cancer is a malignancy. Researchers have concerned it for a long time. In thisdissertation, we attain biological texts from PubMed and establish a prostate cancer proteininteraction networks by the proposed methods. The results show that our proposed methodis pretty good. Network topology analysis results also show that the network node degreedistribution is scale-free.
引文
[1] Cohen K Bretonnel, Hunter Lawrence. Getting Started in Text Mining[J]. PLoSComputational Biology,2008,4(1):1-3.
    [2] Swanson D R. Fish oil, Raynaud's syndrome, and undiscovered publicknowledge[J]. Perspect Biology and Medicine,1986,30(1):7-18.
    [3] Joseph Thomas, Saipradeep Vangala G, Venkat Raghavan, et al. TPX: Biomedicalliterature search made easy[J]. Bioinformation,2012,8(12):578-580.
    [4] Yen-Ching CR, Tzong-Han Tsai, Wen-Lian Hsu. New Challenges for BiologicalText-Mining in the Next Decade[J]. Journal of Computer Science and Technology25:169-179.
    [5] Karin M Verspoor, Judith D Cohn, Komandur E Ravikumar, Michael E Wall. TextMining Improves Prediction of Protein Functional Sites[J]. PLoS One,2012,7(2):1-16.
    [6] Elisa Donnard, Adriano Barbosa-Silva, Rafael LM Guedes, et al. Preimplantationdevelopment regulatory pathway construction through a text-mining approach[J].BMC Genomics,2011,12(Suppl4):1-13.
    [7] Udo Hahn, K Bretonnel Cohen, Yael Garten, et al. Mining the pharmacogenomicsliterature—a survey of the state of the art[J]. Brief Bioinform,2012,13(4):460-494.
    [8] Azuaje F J, Heymann M, Ternes A M, et al. Bioinformatics as a driver, not apassenger, of translational biomedical research: perspectives from the6th Beneluxbioinformatics conference[J]. Journal of Clinical Bioinformatics,2012,2(7):1-3.
    [9] Richard S Sutton, Andrew G. Barto. Reinforcement Learning[M]. MIT Press,1998.
    [10]陈学松,杨宜民.强化学习研究综述[J].计算机应用研究,2010,27(8):2834-2844.
    [11] Cornelius Weber, Mark Elshaw, Norbert Michael Mayer. ReinforcementLearning:Theory and Applications[M]. Croatia: I-Tech Education and Publishing,2008.
    [12] Farhang Sahba, Hamid R Tizhoosh, Magdy M A Salama. A ReinforcementLearning Framework for Medical Image Segmentation[C]. In: Proceedings of IEEEInternational Joint Conference on Neural Networks,2006, pp.511-517.
    [13] Cang Ye, Nelson H C Yung, Wang Danwei. A Fuzzy Controller With SupervisedLearning Assisted Reinforcement Learning Algorithm for Obstacle Avoidance[J].IEEE Transactions on Systems, Man, and Cybernetics,2003,33(1):17-27.
    [14] Alison Watts. A Dynamic Model of Network Formation[J]. Games and EconomicBehavior,2001,34:331-341.
    [15] Brian Skyrms, Robin Pemantle, A Dynamic Model of Social Network Formation[J].PNAS,2000,97(16):9340-9345.
    [16] Jorrit J Hornberga, Frank J Bruggemana, Hans V Westerhoffa. Cancer: a SystemsBiology disease[J]. Biosystems,2006,83(2-3):81-90.
    [17] Leland H Hartwell, John J Hopfield, Stanislas Leibler, et al. From molecular tomodular cell biology[J]. Nature,1999.402:47-52.
    [18]何大韧,刘宗华,汪秉宏.复杂系统与复杂网络[M].高等教育出版社,2012年5月.
    [19]孙之荣,等.系统生物学-哲学基础[M].科学出版社,2008年8月.
    [20] Arunachalam Vinayagam, Ulrich Stelzl, Raphaele Foulle, et al. A Directed ProteinInteraction Network for Investigating Intracellular Signal Transduction[J]. ScienceSignaling,2011,4(189):rs8.
    [21] M Shahid Mukhtar, Anne-Ruxandra Carvunis, Matija Dreze, et al. IndependentlyEvolved Virulence Effectors Converge onto Hubs in a Plant Immune SystemNetwork[J]. Science,2011,333(6042):596-601.
    [22] Ahmedin Jemal, DVM, Freddie Bray, Melissa M. Center, Jacques Ferlay, ElizabethWard, David Forman. Global Cancer Statistics[J]. CA: A Cancer Journal forClinicians,2011,61(2):69-90.
    [23]张琴,顾治华,倪红伟.上海市徐汇区2000-2010年恶性肿瘤发病率统计分析[J].山西医药杂志,2012,41(6):561-563.
    [24] http://www.ncbi.nlm.nih.gov/pubmed/.
    [25]宋涛,洪宝发,高江平,张磊,蔡伟.前列腺癌组织中凋亡抑制基因Livin的表达研究[J].中华男科学杂志,2008,14(1):30-33.
    [26]蒋宏毅,赵晓昆,钟朝晖,等. PTEN/MMAC1/TEP1、TGF-β1在前列腺癌及前列腺增生中的表达及其意义[J].中国现代医学杂志,2008,18(9):1221-1225.
    [27]丁国芳,李继承,徐银峰,等.中国人前列腺癌VEGF-C mRNA、VEGFR-3和CD31表达与肿瘤转移的关系[J].实验生物学报,2005,38(3):257-264.
    [28]刘艳波,沈维高,葛贺,等. surviving和GRIM-19在前列腺癌组织中的表达[J].中华男科学杂志,2011,17(1):21-26.
    [29]尹玉,李明,李昊,等.6种microRNAs在前列腺癌组织中的表达[J].中华男科学杂志,2010,16(7):599-605.
    [30]易晓明,周文泉.前列腺癌的表观遗传学研究进展[J].中华男科学杂志,2010,16(7):635-641.
    [31]刘永霞,朱小丽,陈玉华,等,雄激素非依赖前列腺癌细胞系代谢组学的初步研究[J].分析化学,2011.39(3): p.305-311.
    [32] Pei Hao, Siyuan Zheng, Jie Ping, et al. Human gene expression sensitivityaccording to large scale meta-analysis[J]. BMC Bioinformatics,2009,10(Suppl1:S56):1-7.
    [33] Chen X, Liang S, Zheng W, Liao Z, et al. Meta-analysis of nasopharyngealcarcinoma microarray data explores mechanism of EBV-regulated neoplastictransformation[J]. BMC Genomics,2008,9(322):1-11.
    [34] X Yang, X Sun. Meta-analysis of several gene lists for distinct types of cancer: asimple way to reveal common prognostic markers[J]. BMC Bioinformatics,2007,8(118):1-17.
    [35] Rhodes D R, Barrette T R, Rubin M A, et al. Meta-analysis of microarrays:interstudy validation of gene expression profiles reveals pathway dysregulation inprostate cancer[J]. Cancer Research,2002,62(15):4427-4433.
    [36] Ghosh D, Barette T R, Rhodes D. Statistical issues and methods for meta-analysisof microarray data: a case study in prostate cancer[J]. Funct Integr Genomics,2003,3(4):180-188.
    [37] GO, http://www.geneontology.org/.
    [38] A Lewin, I C Grieve. Grouping Gene Ontology terms to improve the assessment ofgene set enrichment in microarray data[J]. BMC Bioinformatics,2006,7(426):1-9.
    [39] Frijters R, Heupers B, van Beek P, et al. CoPub: a literature-based keywordenrichment tool for microarray data analysis[J]. Nucleic Acids Research,2008,36(Web Server issue):406-410.
    [40] Lian Heng. MOST: detecting cancer differential gene expression[J]. Biostatistics,2008,9(3):411-418.
    [41] F Liu, B Wu. Multi-group cancer outlier differential gene expression detection[J].Computational Biol Chem,2007,31(2):65-71.
    [42] James W MacDonald, Debashis Ghosh. COPA--cancer outlier profile analysis[J].Bioinformatics,2006,22(23):2950-2951.
    [43] R Tibshirani, T Hastie. Outlier sums for differential gene expression analysis[J].Biostatistics,2007.8(1):2-8.
    [44] Baolin Wu. Cancer outlier differential gene expression detection[J]. Biostatistics,2007,8(3): p.566-75.
    [45] E T Liu. Integrative biology--a strategy for systems biomedicine[J]. NatureReviews Genetics,2009,10(1): p.64-68.
    [46] A Ng, B Bursteinas, Q Gao, et al. Resources for integrative systems biology: fromdata through databases to networks and dynamic system models[J]. BriefBioinform,2006,7(4):318-230.
    [47] E D Coelho, J P Arrais, J L Oliveira. From protein-protein interactions to rationaldrug design: are computational methods up to the challenge?[J]. Curr Top MedChem,2013,13(5):602-618.
    [48]李舟军,陈义明,刘军万,等.蛋白质相互作用研究中的计算方法综述[J].计算机研究与发展,2008,45(12):2129-2137.
    [49]雷秀娟,田建芳.蛋白质相互作用网络的蜂群信息流聚类模型与算法[J].计算机学报,2012,35(1):1-12.
    [50] Xose S Puente, Luis M Sánchez, Christopher M Overal, et al. Human and mouseproteases: a comparative genomic approach[J]. Nature Reviews Genetics,2003,4:544-558.
    [51]林标杨.系统生物学[M].浙江大学出版社,2012年6月.
    [52] Rozenblatt Rosen O, Deo RC, Padi M, et al. Interpreting Cancer Genomes UsingSystematic Host Network Perturbations by Tumour Virus Proteins[J]. Nature,2012,487(7408):491-495.
    [53] Xuebing Wu, Rui Jiang, Michael Q Zhang, et al. Network Based Global Inferenceof Human Disease Genes[J]. Molecular Systems Biology,2008,4(189):1-11.
    [54] Albert László Barabási, Natali Gulbahce, et al. Network Medicine: ANetwork-Based Approach to Human Disease[J]. Nature,2011,12(1):56-68.
    [55] Hanyu Chuang, Eunjung Lee, Yutsueng Liu, et al. Network-Based Classification ofBreast Cancer Metastasis[J]. Molecular Systems Biology,2007,3(140):1-10.
    [56] Albert LászlóBarabási. Scale-Free Networks: A Decade and Beyond[J]. Science,2009,325:412-413.
    [57] Müller HM, Rangarajan A, Teal TK, et al. Textpresso For Neuroscience: Searchingthe Full Text of Thousands of Neuroscience Research Papers[J]. Neuroinformatics,2008,6(3):195-204.
    [58] Müller HM, Kenny EE, Sternberg PW. Textpresso: An Ontology-BasedInformation Retrieval and Extraction System for Biological Literature[J]. PLoSBiology,2004,2(11):1984-1988.
    [59] Dietrich Rebholz-Schuhmann, Harald Kirsch, Miguel Arregui, et al. Proteinannotation by EBIMed[J]. Nature,2006,24(8):902-903.
    [60] Andreas Doms, Michael Schroeder. Gopubmed: Exploring Pubmed With the GeneOntology[J]. Nucleic Acids Research,2005,33(2):783-7836.
    [61] Dieter Maier, Wenzel Kalus, Martin Wolff, et al. Knowledge Management forSystems Biology A General and Visually Driven Framework Applied toTranslational Medicine[J]. BMC System Biology,2011,5(38):1-16.
    [62] Sérgio Matos, Joel P Arrais, Jo o Maia Rodrigues. Concept-Based QueryExpansion for Retrieving Gene Related Publications from MEDLINE[J]. BMCBioinformatics,2010,11(212):1-9.
    [63] Guodong Zhou, Jie Zhang, Jian Su, et al. Recognizing Names in Biomedical Texts:A Machine Learning Approach[J]. Bioinformatics,2004,20(7):1178-1190.
    [64] Y L Yeganova, L Smith, W J Wilbur. Identification of Related Gene/Protein NamesBased on an HMM of Name Variations[J]. Computational Biology Chemistry,2004,28(2):97-107.
    [65] Jie Zhanga, Dan Shen, Guodong Zhou, et al. Enhancing HMM-Based BiomedicalNamed Entity Recognition by Studying Special Phenomena[J]. Journal ofBiomedical Informatics,2004,37(6):411-422.
    [66] Mona Soliman Habib, Jugal Kalita. Scalable Biomedical Named Entity Recognition:Investigation of a Database-Supported SVM Approach[J]. International JournalBioinformatics Research and Application,2010,6(2):191-208.
    [67] Zhou, Goudong. Recognizing Names in Biomedical Texts Using MutualInformation Independence Model and SVM Plus Sigmoid[J]. International Journalof Medical Informatics,2006,75(6): p.456-467.
    [68] Lee Ki Joong, Hwang Young Sook, Kim Seonho, et al. Biomedical Named EntityRecognition Using Two-Phase Model Based on Svms[J]. Journal BiomedicalInformatics,2004,37(6):436-447.
    [69] Lishuang Li, Rongpeng Zhou, Degen Huang. Two-Phase Biomedical Named EntityRecognition Using Crfs[J]. Computational Biology and Chemistry,2009,33(4):334-338.
    [70] Zhu Fei, Shen Bairong. Combined SVM-CRFs for Biological Named EntityRecognition with Maximal Bidirectional Squeezing[J]. PLoS One,2012,7(8):1-8.
    [71] Jian Wang, Wenwu Shao, Fei Zhu. Biological Terms Boundary Identification byMaximum Entropy Model[C]. In the6th IEEE Conference on Industrial Electronicsand Applications,2011, Beijin, pp.2446-2448.
    [72] Soumya Raychaudhuri, Jeffrey T Chang, Patrick D Sutphin, et al. AssociatingGenes With Gene Ontology Codes Using a Maximum Entropy Analysis ofBiomedical Literature[J]. Genome Research,2002,12(1):203-214.
    [73] Sujan Kumar Saha, Sudeshna Sarkar, Pabitra Mitra. Feature Selection Techniquesfor Maximum Entropy Based Biomedical Named Entity Recognition[J]. JournalBiomedical Informatics,2009,42(5):905-911.
    [74] Jun'ichi Kazama Takaki, Takaki Makino, Yoshihiro Ohta. Tuning Support VectorMachines for Biomedical Named Entity Recongnition[C]. In: Proceedings of theACL-02Workshop on Natural Language Processing in the Biomedical Domain,2002, pp.1-8.
    [75] Guo Dong Zhou, Dan Shen, Jie Zhang, et al. Recognition of Protein/Gene Namesfrom Text Using an Ensemble of Classifiers[J]. BMC Bioinformatics,2005,6(Suppl1: S7):1-7.
    [76] Lin Jou Wei, Chang Chia Hsuin, Lin Ming Wei, et al. Automating The Process ofCritical Appraisal and Assessing the Strength of Evidence with InformationExtraction Technology[J]. Journal of Evaluation in Clinical Practice,2011,17(4):832-838.
    [77] Tzonghan Tsai, Wenchi Chou, Shihhung Wu. Integrating Linguistic Knowledgeinto a Conditional Random Field Framework to Identify Biomedical NamedEntities[J]. Expert Systems with Applications,2006,30(1):117-128.
    [78] Jin Dong Kim, Tomoko Ohta, Yuka Tateisi, et al. GENIA Corpus-SemanticallyAnnotated Corpus for Bio-Textmining[J]. Bioinformatics,2003,19(Suppl1):180-182.
    [79] GENIA, http://www-tsujii.is.s.u-tokyo.ac.jp/~genia/geniaform.cgi.
    [80] Mandel M A. Integrated Annotation of Biomedical Text: Creating The PennbioieCorpus[C]. In: Proceedings of The Workshop on Text Mining, Ontologies andNatural Language Processing in Biomedicine,2006, pp.1-5.
    [81] PennBioIE, http://truffula.ldc.upenn.edu:8080/publications/latest_release/.
    [82] Lorraine Tanabe, Natalie Xie, Lynne H Thom, et al. GENETAG: A Tagged Corpusfor Gene/Protein Named Entity Recognition[J]. BMC Bioinformatics,2005,6(Suppl1S3):1-7.
    [83] GENETAG, ftp://ftp.ncbi.nlm.nih.gov/pub/tanabe/.
    [84] Kristofer Franze′na, Gunnar Erikssona, Fredrik Olssona, et al. Protein Names andHow to Find Them[J]. International Journal of Medical Informatics,2002,67(1-3):49-61.
    [85] Yapex, http://www.sics.se/humle/projects/prothalt/.
    [86] Helen L Johnson, William A Baumgartner, Martin Krallinger. Corpus Refactoring:A Feasibility Study[J]. Journal of Biomedical Discovery and Collaboration,2007,2(4):1-11.
    [87] PICorpus, http://bionlp-corpora.sourceforge.net/picorpus/index.shtml.
    [88] Sampo Pyysalo, Filip Ginter, Juho Heimonen. Bioinfer: A Corpus for InformationExtraction in the Biomedical Domain[J]. BMC Bioinformatics,2007,8(50):1-24.
    [89] Ginter Filip, Pyysalo Sampo, Bj rne Jari, et al. BioInfer relationship annotationmanual[R]. TUCS Technical Report,2007.
    [90] BioInfer, http://mars.cs.utu.fi/BioInfer/.
    [91] Razvan Bunescua, Ruifang Gea, Rohit J Katea, et al. Comparative Experiments onLearning Information Extractors for Proteins and Their Interactions[J]. ArtificialIntelligence in Medicine,2005,33(2):139-155.
    [92] Texas, http://www.cs.utexas.edu/users/ml/index.cgi?page=resourcesrepo.
    [93] Jin Dong Kim, Tomoko Ohta, Jun'ichi Tsujii. Corpus Annotation for MiningBiomedical Events from Literature[J]. BMC Bioinformatics,2008,9(10):1-25.
    [94] GENIAevents,http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/home/wiki.cgi?page=Event+Annotation.
    [95] Yuka Tateisi, Tomoko Ohta, Nigel Collier, et al. Building an Annotated Corpus inThe Molecular-Biology Domain[C]. In: Proceeding of COLING SAIC Workshop,2000, pp.28-34.
    [96] Bio1, http://research.nii.ac.jp/~collier/resources/bio1.1.xml.
    [97] Jin Dong Kim, Tomoko Ohta, Yoshimasa Tsuruoka, et al. Introduction to TheBio-Entity Recognition Task at JNLPBA[C]. In: Proceedings of the InternationalJoint Workshop on Natural Language Processing in Biomedicine and ItsApplications,2004, pp.70-75.
    [98] BioNLP/JNLPBA-2004,http://research.nii.ac.jp/~collier/workshops/JNLPBA04st.htm.
    [99] Inderjeet Mani, Zhangzhi Hu, Seok Bae Jang, et al. Protein Name TaggingGuidelines: Lessons Learned[J]. Comparative Functional Genomics,2005.6(1-2):72-76.
    [100] PIR, http://pir.georgetown.edu/pirwww/iprolink/protname.shtml.
    [101] Burr Settles. ABNER: An Open Source Tool for Automatically Tagging Genes,Proteins and Other Entity Names in Text[J]. Bioinformatics,2005.21(14):3191-3192.
    [102] ABNER, http://pages.cs.wisc.edu/~bsettles/abner/.
    [103] Yoshimasa Tsuruoka, Jun'ichi Tsujii. Bidirectional Inference with The Easiest-FirstStrategy for Tagging Sequence Data[C]. In: Proceedings of the conference onHuman Language Technology and Empirical Methods in Natural LanguageProcessing Association for Computational Linguistics,2005, pp.467-474.
    [104] GENIATagger,http://www-tsujii.is.s.u-tokyo.ac.jp/GENIA/home/wiki.cgi?page=GENIA+Tagger.
    [105] Bob Carpenter. LingPipe for99.99%recall of gene mentions[C]. In: Proceedings ofthe2nd BioCreative Workshop, Valencia, Spain,2007, pp.1-3.
    [106] Bob Carpenter. Character language models for Chinese word segmentation andnamed entity recognition[C]. In: Proceeding of Association for ComputationalLinguistics,2006, pp.169-172.
    [107] LingPipe, http://www.alias-i.com/lingpipe/.
    [108] Jeffrey T Chang, Hinrich Schütze, Russ B Altman. GAPSCORE: finding gene andprotein names one word at a time[J]. Bioinformatics,2004.20(2):216-225.
    [109] GAPSCORE, http://bionlp.stanford.edu/gapscore/.
    [110] Lorraine Tanabe, W John Wilbur. Tagging gene and protein names in biomedicaltext[J]. Bioinformatics,2002,18(8):1124-1132.
    [111] AbGene, ftp://ftp.ncbi.nlm.nih.gov/pub/tanabe/AbGene/.
    [112] Fukuda K, Tamura A, Tsunoda T, Takagi T. Toward information extraction:identifying protein names from biological papers[C]. In: Proceeding of PacificSymposium Biocomputing,1998, pp.707-718.
    [113] KeX, http://www.hgc.jp/service/tooldoc/KeX/intro.html.
    [114] Cecilia N Arighi, Phoebe M Roberts, Shashank Agarwal, et al. BioCreative IIIinteractive task: an overview[J]. BMC Bioinformatics,2011,12(Suppl8: S4):1-21.
    [115] Jeffrey T Chang, Hinrich Schütze, Russ B Altman. Creating an Online Dictionaryof Abbreviations from MEDLINE[J]. Journal of American Medical InformationAssocation,2002,9(6):612-620.
    [116] Hong Yu, George Hripcsak, Carol Friedman. Mapping abbreviations to full formsin biomedical articles[J]. Journal of American Medical Information Assocation,2002,9(3):262-272
    [117] Ariel S Schwartz, Marti A Hearst. A simple algorithm for identifying abbreviationdefinitions in biomedical text[C]. In: Proceeding of Pacific SymposiumBiocomputing,2003, pp.451-62.
    [118] Liu H, Friedman C. Mining terminological knowledge in large biomedicalcorpora[C]. In: Proceeding of Pacific Symposium on Biocomputing,2003,pp.415-426.
    [119] John McCrae, Nigel Collier. Synonym set extraction from the biomedical literatureby lexical pattern discovery[J]. BMC Bioinformatics,2008,9(159):1-13.
    [120] Cohen A M, Hersh W R, Dubay C, et al. Using co-occurrence network structure toextract synonymous gene and protein names from MEDLINE abstracts[J]. BMCBioinformatics,2005,6(103):1-15.
    [121] LLL05, http://genome.jouy.inra.fr/texte/LLLchallenge/.
    [122] BioCreative,http://www.pdg.cnb.uam.es/BioLINK/workshop_BioCreative_04/results/.
    [123] ALICE, http://3.uvdb.dbcls.jp/ALICE/ALICE_index.html.
    [124] Medstract, http://medstract.org/.
    [125] BioText, http://biotext.berkeley.edu/data.html.
    [126] ADAM, http://128.248.65.210/arrowsmith_uic/adam.html.
    [127] Wei Zhou, Vetle I Torvik, Neil R Smalheiser. ADAM: another database ofabbreviations in MEDLINE[J]. Bioinformatics,2006.22(22):2813-2838.
    [128] LRABR, http://www.nlm.nih.gov/pubs/factsheets/umls.html.
    [129] Hiroko Ao, Toshihisa Takagi. ALICE: an algorithm to extract abbreviations fromMEDLINE[J]. Journal of American Medical Informatics Association,2005,12(5):576-586.
    [130] ARGH, http://invention.swmed.edu/argh/.
    [131] J D Wren, H R Garner. Heuristics for identification of acronym-definition patternswithin text: towards an automated construction of comprehensiveacronym-definition dictionaries[J]. Methods Information in Medicine,2002,41(5):426-434.
    [132] S&H, http://biotext.berkeley.edu/software.html.
    [133] Acromine, http://www.nactem.ac.uk/software/acromine/.
    [134] Naoaki Okazaki, Sophia Ananiadou. Building an abbreviation dictionary using aterm recognition approach[J]. Bioinformatics,2006,22(24):3089-3095.
    [135] SFThesaurus, http://gauss.dbb.georgetown.edu/liblab/SFThesaurus/.
    [136] Manabu Torii, Zhangzhi Hu, Min Song, et al. A comparison study on algorithms ofdetecting long forms for short forms in biomedical text[J]. BMC Bioinformatics,2007.8(Suppl9: S5):1-9.
    [137] AcronymAttic, http://www.acronymattic.com/.
    [138] SpecialDictionary, http://www.special-dictionary.com/acronyms/.
    [139] Abbreviation, http://www.abbreviations.com/.
    [140] ACRONYMA, http://www.acronyma.com/.
    [141] Donna Maglott, Jim Ostell, Kim D Pruitt, et al. Entrez Gene: gene-centeredinformation at NCBI[J]. Nucleic Acids Research,2005,1(33):54-58.
    [142] UniProtKB, http://www.uniprot.org/help/uniprotkb.
    [143] Rimer M, O'Connell M. BioABACUS: a database of abbreviations and acronyms inbiotechnology and computer science[J].Bioinformatics,1998,14(10):888-889.
    [144] Asma Ben Abacha, Pierre Zweigenbaum. Automatic extraction of semanticrelations between medical entities: a rule based approach[J]. Journal of BiomedicalSemantics,2011,2(Suppl5: S4):1-11.
    [145] Cory B Giles, Jonathan D Wren. Large-scale directional relationship extraction andresolution[J]. BMC Bioinformatics,2008,9(Suppl9:S11):1-13.
    [146] Hong Woo Chun, Yoshimasa Tsuruoka, Jin Dong Kim, et al. Automaticrecognition of topic-classified relations between prostate cancer and genes usingMEDLINE abstracts[J]. BMC Bioinformatics,2006,7(Suppl3: S4):1-9.
    [147] Eskin E, Agichtein E. Combining text mining and sequence analysis to discoverprotein functional regions[C]. In: Proceeding of Pacific Symposium onBiocomputing,2004,288-299.
    [148] Li X, Cai H, Xu J, Ying S, Zhang Y. A mouse protein interactome throughcombined literature mining with multiple sources of interaction evidence[J]. AminoAcids,2010,38:1237-1252.
    [149] Shashank Agarwal, Feifan Liu, Hong Yu. Simple and efficient machine learningframeworks for identifying protein-protein interaction relevant articles andexperimental methods used to study the interactions[J]. BMC Bioinformatics,2011,12(Suppl8: S10):1-8.
    [150] Richard Tzong Han Tsai, Po Ting Lai, Hong Jie Dai, et al. HypertenGene:extracting key hypertension genes from biomedical literature with position andautomatically-generated template features[J]. BMC Bioinformatics,2009,10(Suppl15: S9):1-9.
    [151] Krallinger Martin, Rojas Ana María, Valencia Alfonso. Creating reference datasetsfor systems biology applications using text mining[J]. Annals of the New YorkAcademy of Sciences,2009,1158(1):14-28.
    [152] Krallinger Martin, Miguel Vazquez, Florian Leitner, et al. The Protein-ProteinInteraction tasks of BioCreative III: classification/ranking of articles and linkingbio-ontology concepts to full text[J]. BMC Bioinformatics,2011,12(Suppl8:S3):1-31.
    [153] Krallinger Martin, Florian Leitner, Alfonso Valencia. Analysis of biologicalprocesses and diseases using text mining approaches[J]. Bioinformatics Methods inClinical Research,2010,593(5):341-382.
    [154] Krallinger Martin, Carlos Rodriguez Penagos, Ashish Tendulkar, et al. PLAN2L: aweb tool for integrated text mining and literature-based bioentity relationextraction[J]. Nucleic Acids Research,2009,37(2):160-165.
    [155] Srinivasan P, Wedemeyer M. Mining concept profiles with the vector model orwhere on earth are diseases being studied[C]. In Proceedings of Text MiningWorkshop, the3thSIAM International Conference on Data Mining.2003.
    [156] Kanaka D Shetty, Siddhartha R Dala, Using information mining of the medicalliterature to improve drug safety[J]. Journal of American Medical InformaticsAssociation,2011,18(5):668-674.
    [157] Wisconsin, http://www.biostat.wisc.edu/~craven/ie/.
    [158] FetchProt, http://www.sics.se/humle/projects/fetchprot/Corpus/Release20051107/.
    [159] HIV-1ProteinInteraction,http://www.ncbi.nlm.nih.gov/RefSeq/HIVInteractions/index.html.
    [160] William Fu, Brigitte E Sanders Beer, Kenneth S Katz, et al. Humanimmunodeficiency virus type1, human protein interaction database at NCBI[J].Nucleic Acids Research,2009,37(Suppl1):417-422.
    [161] MINT, http://mint.bio.uniroma2.it/mint/Welcome.do.
    [162] Andrew Chatr-aryamontri, Arnaud Ceol, Luisa Montecchi Palazzi, et al. MINT: theMolecular INTeraction database[J]. Nucleic Acids Research,2007,35(Suppl1):572-574.
    [163] DIP, http://dip.doe-mbi.ucla.edu/dip/Main.cgi.
    [164] Ioannis Xenarios, Lukasz Salwínski, Xiaoqun Joyce Duan. DIP, the Database ofInteracting Proteins: a research tool for studying cellular networks of proteininteractions[J]. Nucleic Acids Res,2002,30(1):303-305.
    [165] BIND, http://www.bind.ca.
    [166] Gary D Bader, Doron Betel, Christopher W V Hogue. BIND: the BiomolecularInteraction Network Database[J]. Nucleic Acids Research,2003,31(1):248-250.
    [167] Randall C Willis, Christopher W V Hogue. Searching, viewing, and visualizingdata in the Biomolecular Interaction Network Database (BIND)[M]. CurrentProtocols in Bioinformatics,2006, Chapter8.
    [168] HPRD, http://www.hprd.org/.
    [169] Suraj Peri, J Daniel Navarro, Ramars Amanchy, et al. Development of humanprotein reference database as an initial platform for approaching systems biology inhumans[J]. Genome Research,2003,13(10):2363-2371.
    [170] MIPS, http://www.helmholtz-muenchen.de/en/mips/.
    [171] H W Mewes, S Dietmann, D Frishman. MIPS: analysis and annotation of genomeinformation in2007[J]. Nucleic Acids Research,2008,36(Suppl1):196-201.
    [172] UniProtKB/SwissProt, http://www.expasy.ch/sprot/.
    [173] Amos Bairocha, Rolf Apweiler. The SWISS-PROT protein sequence database andits supplement TrEMBL in2000[J]. Nucleic Acids Research,2000,28(1):45-48.
    [174] HPID, http://wilab.inha.ac.kr/hpid/.
    [175] Kyungsook Han, Byungkyu Park, Hyongguen Kim, et al. HPID: the Human ProteinInteraction Database[J]. Bioinformatics,2004,20(15):2466-2470.
    [176] IntAct, http://www.ebi.ac.uk/intact/main.xhtml.
    [177] S Kerrien, Y Alam Faruque, B Aranda, et al. IntAct-open source resource formolecular interaction data[J]. Nucleic Acids Research,2007,35(Suppl1):561-565.
    [178] STRING, http://string.embl.de/.
    [179] Lars J Jensen, Michael Kuhn, Manuel Stark, et al. STRING8-a global view onproteins and their functional interactions in630organisms[J]. Nucleic AcidsResearch,2009,37(Suppl1):412-416.
    [180] PDZBase, http://icb.med.cornell.edu/services/pdz/start.
    [181] Thijs Beuming, Lucy Skrabanek, Masha Y Niv, et al. PDZBase: a protein-proteininteraction database for PDZ-domains[J]. Bioinformatics,2005,21(6):827-828.
    [182] Reactome, http://www.reactome.org/.
    [183] Imre Vastrik, Peter D'Eustachio, Esther Schmidt, et al. Reactome: a knowledgebase of biologic pathways and processes[J]. Genome Biology,2007,8(3):402-414.
    [184] MedEvi, http://www.ebi.ac.uk/Rebholz-srv/MedEvi/.
    [185] Jungjae Kim, Piotr Pezik, Dietrich Rebholz Schuhmann. MedEvi: retrieving textualevidence of relations between biomedical concepts from Medline[J].Bioinformatics,2008,24(11):1410-1412.
    [186] PLAN2L, http://zope.bioinfo.cnio.es/plan2l/plan2l.html.
    [187] BioRAT, http://bioinf.cs.ucl.ac.uk/biorat/.
    [188] David P A Corney, Bernard F Buxton, William B Langdon, et al. BioRAT:extracting biological information from full-length papers[J]. Bioinformatics,2004,20(17):3206-3213.
    [189] MedScan, http://www.ariadnegenomics.com/technology-research/medscan/.
    [190] Novichkova S, Egorov S, Daraselia N. MedScan, a natural language processingengine for MEDLINE abstracts[J]. Bioinformatics,2003,19(13):1699-1706.
    [191] SherLoc, http://www-bs.informatik.uni-tuebingen.de/Services/SherLoc/.
    [192] Hagit Shatkay, Annette H glund, Scott Brady, et al. SherLoc: high-accuracyprediction of protein subcellular localization by integrating text and proteinsequence data[J]. Bioinformatics,2007,23(11):1410-1417.
    [193] BCMS, http://bcms.bioinfo.cnio.es/.
    [194] Florian Leitner, Martin Krallinger, Carlos Rodriguez Penagos, et al. Introducingmeta-services for biomedical information extraction[J]. Genome Biology,2008,9(Suppl2:S6):1-11.
    [195] PubGene, http://www.pubgene.org/.
    [196] Tor Kristian Jenssen, Astrid L greid, Jan Komorowski, et al. A literature networkof human genes for high-throughput analysis of gene expression[J]. NatureGenetics,2001,28(1):21-28.
    [197] iHOP, http://www.ihop-net.org/UniPub/iHOP/.
    [198] Robert Hoffmann, Alfonso Valencia. A gene network for navigating the literature.Nature Genetics,2004,36(7):664.
    [199] Chilibot, http://www.chilibot.net/.
    [200] Hao Chen, Burt M Sharp. Content-rich biological network constructed by miningPubMed abstracts[J]. BMC Bioinformatics,2004,5(147):1-15.
    [201] KinasePathway, http://kinasedb.ontology.ims.u-tokyo.ac.jp:8081/.
    [202] Asako Koike, Yoshiyuki Kobayashi, Toshihisa Takagi, Kinase pathway database:an integrated protein-kinase and NLP-based protein-interaction resource. GenomeRes,2003,13(6A):1231-1243.
    [203] G2D, http://www.ogic.ca/projects/g2d_2.
    [204] Carolina Perez Iratxeta, Peer Bork, Miguel A Andrade. Association of genes togenetically inherited diseases using data mining[J]. Nature Genetics,2002,31(3):316-9.
    [205] Anna Korhonen, Diarmuid ó Séaghdha, Ilona Silins, et al. Text Mining forLiterature Review and Knowledge Discovery in Cancer Risk Assessment andResearch[J]. PLoS One,2012.7(4):20-36.
    [206] Seungyoon Nam, Taesung Park. Pathway-based evaluation in early onset colorectalcancer suggests focal adhesion and immunosuppression along withepithelial-mesenchymal transition[J]. PLoS One,2012,7(4):1-14.
    [207] Urzua U, Owens G A, Zhang G M, et al. Tumor and reproductive traits are linkedby RNA metabolism genes in the mouse ovary: a transcriptome-phenotypeassociation analysis[J]. BMC Genomics,2010,11(Suppl.5: S1):1-16.
    [208] Jiao Li, Xiaoyan Zhu, Jake Yue Chen. Building disease-specific drug-proteinconnectivity maps from molecular interaction networks and PubMed abstracts[J].PLoS Compututaional Biology2009,5(7):1-22.
    [209] Arrowsmith, http://arrowsmith.psych.uic.edu/cgi-bin/arrowsmith_uic/start.cgi.
    [210] Neil R Smalheiser, Vetle I Torvik, Wei Zhou. Arrowsmith two-node searchinterface: a tutorial on finding meaningful links between two disparate sets ofarticles in MEDLINE[J]. Computer Methods and Programs in Biomedicine,2009,94:190-197.
    [211] BITOLA, http://ibmi.mf.uni-lj.si/bitola/.
    [212] Hristovski D, Peterlin B, Mitchell J A, et al. Improving literature based discoverysupport by genetic knowledge integration[J]. Studies in Health Technology andInformatics,2003,95:68-73.
    [213] Hristovski D, Peterlin B, Mitchell J A, et al. Using literature-based discovery toidentify disease candidate genes. International Journal of Medical Informatics,2005,74:289-98.
    [214] BITOLA, http://ibmi.mf.uni-lj.si/bitola/.
    [215] Maté Ongenaert, Leander Van Neste, Tim De Meyer, et al. PubMeth: a cancermethylation database combining text-mining and expert annotation[J]. NucleicAcids Research,2008,36(Suppl1):842-846.
    [216] Yu Ching Fang, Po Ting Lai, Hong Jie Dai, et al. MeInfoText2.0: genemethylation and cancer relation extraction from biomedical literature[J]. BMCBioinformatics,2011,12(471):1-8.
    [217] Xutao Deng, Huimin Geng, Dhundy R. Bastola, et al. Link test-A statistical methodfor finding prostate cancer biomarkers[J]. Computational Biology and Chemistry,2006,30(6):425-433.
    [218] OMIM, http://www.omim.org/.
    [219] Natarajan J, Berrar D, Dubitzky W, Hack C, Zhang Y, DeSesa C, et al. Text miningof full-text journal articles combined with gene expression analysis reveals arelationship between sphingosine-1-phosphate and invasiveness of a glioblastomacell line[J]. BMC Bioinformatics,2006,7(373):1-16.
    [220] Kolluru B, Nakjang S, Hirt RP, et al. Automatic extraction of microorganisms andtheir habitats from free text using text mining workflows[J]. Journal of IntegrativeBioinformatics,2011,8(2):2011-2184.
    [221] Yun Xu, Da Teng, Yiming Lei. MinePhos: A literature mining system for proteinphoshphorylation information extraction[J]. IEEE/ACM Transaction onComputational Biolology and Bioinformatics,2012,9(1):311-315.
    [222] Adriano Barbosa Silva, Jean Fred Fontaine, Elisa R Donnard, et al. PESCADOR, aweb-based tool to assist text-mining of biointeractions extracted from PubMedqueries[J]. BMC Bioinformatics,2011,12(435):1-9.
    [223] W G Guo, Y Zhang, D Ge, Y X Zhang, et al. Bioinformatics analyses combinedmicroarray identify the desregulated MicroRNAs in lung cancer[J]. EuropeanReview for Medical and Pharmacological Sciences,2013.17(11):1509-1516.
    [224] Xiao Li, Haoyang Cai, Jiabao Xu, et al. A mouse protein interactome throughcombined literature mining with multiple sources of interaction evidence[J]. AminoAcids,2010,38(4):1237-1252.
    [225] Adrian Benton, Shawndra Hill, Lyle Ungar, et al. A system for de-identifyingmedical message board text[J]. BMC Bioinformatics,2011,12(Suppl3: S2):1-10.
    [226] You M, Zhao R W, Li G Z, et al. MAPLSC: a novel multi-class classifier formedical diagnosis[J]. Internation Journal of Data Mining and Bioinformatics,2011,5(4):383-401.
    [227] Shin M, Lee H, Hong M. A hybrid approach to gene ranking using gene relationnetworks derived from literature for the identification of disease gene markers[J].Internation Journal of Data Mining and Bioinformatics,2012,6(3):239-254.
    [228] Paul Thompson, John McNaught, Simonetta Montemagni, et al. The BioLexicon: alarge-scale terminological resource for biomedical text mining[J]. BMCBioinformatics,2011,12(397):1-40.
    [229] Paul Thompson, Raheel Nawaz, John McNaught, et al. Enriching a biomedicalevent corpus with meta-knowledge annotation[J]. BMC Bioinformatics,2011,12: p.393.
    [230] Maqungo M, Kaur M, Kwofie S K, et al. DDPC: dragon database of genesassociated with prostate cancer[J] Nucleic Acids Research,2011,39(29):1-10.
    [231] Lishan Wang, Yuanyuan Xiong, Yihua Sun, et al. HLungDB: an integrateddatabase of human lung cancer research[J]. Nucleic Acids Research,2010,38:665-669.
    [232] Paul Mayer, Bernd Mayer, Gert Mayer. Systems Biology: Building a Useful Modelfrom Multiple Markers and Profiles[J]. Nephrology Dialysis Transplantation,2012,27(11),3995-4002.
    [233] Umay Kulsum, Vishwadeep Singh, Sujata Sharma, et al. RASOnD-acomprehensive resource and search tool for RAS superfamily oncogenes fromvarious species[J]. BMC Genomics,2011,12(341):1-18.
    [234] Christos Andronis, Anuj Sharma, Vassilis Virvilis, et al. Literature mining,ontologies and information visualization for drug repurposing[J]. Brief inBioinformatics,2011,12(4):357-368.
    [235] Ando M, Morita T, O’Connor S J. Primary concerns of advanced cancer patientsidentified through the structured life review process: a qualitative study using a textmining technique[J]. Palliative and Support Care,2007,5:265-71.
    [236] Ahmed J, Meinel T, Dunkel M, et al. CancerResource: a comprehensive database ofcancer-relevant proteins and compound interactions supported by experimentalknowledge[J]. Nucleic Acids Research,2011,39:960-967.
    [237] Thorsten Joachims. Training linear SVMs in linear time[C]. In: Proceeding of2006KDD, Philadelphia, Pennsylvania, USA,2006, pp.1-10.
    [238] Chih-Chung Chang, Chih Jen Lin. LIBSVM: a library for support vectormachines[J]. ACM Transactions on Intelligent Systems and Technology,2(27):1-27.
    [239] Zhenfei Ju, Jiang Wang, Fei Zhu. Named Entity Recognition from Biomedical TextUsing SVM[C]. In: Proceeding of5th International Conference on Bioinformaticsand Biomedical Engineering,2011, pp.1-4.
    [240] Chih Wei Hsu, Chih Jen Lin. A Comparison of Methods for Multiclass SupportVector Machines[J]. IEEE Transactions on Neural Networks,2002,13(2):415-425.
    [241] John Lafferty, Andrew McCallum, Fernando C N Pereira. Conditional RandomFields: Probabilistic Models for Segmenting and Labeling Sequence Data[C]. In theProceeding of18th International Conference on Machine Learning,2001,pp.282-289.
    [242] Christopher M Bishop. Pattern Recognition And Machine Learning[M].Springer,2007.
    [243] Birger Hj rland. The foundation of the concept of relevance[J]. Journal of theAmerican Society for Information Science and Technology,2010,61(2):217-237.
    [244] Kelly H Zou, James O Malley, Laura Mauri. Receiver-operating characteristicanalysis for evaluating diagnostic tests and predictive models[J]. Circulation.2007,115:654-657.
    [245] Jorge M Lobo, Jiménez-Valverde Alberto, Real Raimundo. AUC: a misleadingmeasure of the performance of predictive distribution models[J]. Global Ecologyand Biogeography,2008,17(2):145-151.
    [246] Jeong H, Tombor B, Albert R, et al. The large-scale organization of metabolicnetworks[J]. Nature,2000,407:651-654.
    [247] Hongwu Ma, An Ping Zeng. Reconstruction of metabolic networks from genomedata and analysis of their global structure for various organisms[J]. Bioinformatics,2003,19(2):270-277.
    [248] http://www.genome.ad.jp/kegg.
    [249] http://biocyc.org/ecocyc.
    [250] Barabási A L, Albert R. Emergence of Scaling in Random Networks[J]. Science,1999,286(5439):509-512.
    [251] P Erd s, A Rényi. On random graphs[J]. Mathematicae,1959,6:290-297.
    [252] http://blog.sciencenet.cn/upload/blog/images/2010/6/201062292129932.gif.
    [253] Duncan J Watts, Steven H Strogatz. Collective dynamics of small-world networks[J]Nature,1998,393:440-442.
    [254] http://blog.sciencenet.cn/upload/blog/images/2010/6/201062292153979.gif.
    [255] http://blog.sciencenet.cn/upload/blog/images/2010/6/201062292214729.gif
    [256] W Hayes, K Sun, N Przulj. Graphlet-based measures are suitable for biologicalnetwork comparison[J]. Bioinformatics,2013,29(4):483-491.
    [257] Marc Vidal, Michael E. Cusick, Albert-LászlóBarabási. Interactome Networks andHuman Disease[J]. Cell,2011,144(6):986-998.
    [258] Richard Durrett. Random graph dynamics[M]. Cambridge University Press,2007.
    [259] Van Hasselt H. Reinforcement Learning: State of the Art [M]. Berlin: Springer,2007,207-251.
    [260] X Guo, and O Hernández-Lerma. Continuous-Time Markov Decision Processes[M],Springer,2009.
    [261] Busoniu L, Babuska R, De Schutter B, et al. Reinforcement learning and fynamicprogramming using function approximators [M]. New York: CRC Press,2010.
    [262] Sigh S. Transfer of Learning by Composing Solutions of Elemental SequentialTasks [J]. Machine Learning,1992,8(3/4):323-339.
    [263] Crites R H, Barto A G. Elevator Group Control using Multiple ReinforcementLearning Agents [J]. Machine Learning,1998,33:235-262.
    [264] Tham C K, Prager R W. A modular Q-Learning Architecture for Manipulator TaskDecomposition [C]. In: proceeding of the11th International Conference onMachine Learning, San Francisco,1994.
    [265] Littman M L, Ravi N, Fenson E and Howard R. Reinforcement Learning forAutonomic Network Repair[C]. In: proceeding of the1st International Conferenceon Autonomic Computing, Rome,2004.
    [266] Kamio S, Iba H. Adaptation Technique for Integrating Genetic Programming andReinforcement Learning for Real Robots[J]. IEEE Transactions on EvolutionaryComputation,2005,9(3):318-333.
    [267] Pieter A, Adam C, Morgan Q, and Andrew Y Ng. An Application of ReinforcementLearning to Aerobatic Helicopter Flight[C]. In: Proceeding of the19th AnnualConference on Neural Information Processing Systems, Vancouver,2007.
    [268] Pallavi A, Zheng R and Szepesvari Cs. Sequential Learning for OptimalMonitoring of Multi-channel Wireless Networks[C]. In: Porceeding of INFOCOM,Orlando,2011.
    [269]蔡庆生,张波.一种基于agent团队的强化学习模型与应用研究[J].计算机研究与发展,2000,37(9):1087-109.
    [270] Gao Y, Li N, Lu Xin, Chen Shifu. A Novel Learning Classifier System Based onReinforcement Learning[C]. In:Proceedings of the7th International Conference onControl and Automation, Xiamen,2002.
    [271] Chen C L, Dong D Y, Chen Z H. Quantum Computation for Action SelectionUsing Reinforcement Learning[J]. International Journal of Quantum Information,2006,4(6):1071-1083.
    [272]王醒策,张汝波,顾国昌.多机器人动态编队的强化学习算法研究[J].计算机研究与发展,2003,40(10):1444-1450.
    [273]陈阳舟,张辉,杨玉珍,胡全连.基于Q学习的Agent在单路口交通控制中的应用[J].公路交通科技,2007(5):117-120.
    [274]刘陶,何炎祥,熊琦.一种基于Q学习的LDoS攻击实时防御机制及其CPN实现[J].计算机研究与发展,2011,48(3):432-439.
    [275]章韵,王静玉,陈志,等.基于Q学习的无线传感器网络自组织方法研究[J].传感技术学报,2010,25(11):1623-1626.
    [276] Quan L, Zhong L, LianXia X, MingHua L. The Research on the Spider of theVertical Search Based on Reinforcement Learning[J]. Journal of ComputationalInformation System,2008,4(1):83-90.
    [277] Precup D, Sutton R S, and Dasgupta S. Off-Policy Temporal-Difference Learningwith Function Approximation[C]. In: Proceedings of the18th InternationalConference on Machine Learning, Williamstown,2001.
    [278] Geramifard A, Bowling M, Sutton R S. Incremental Least-Square TemporalDifference Learning[C]. In: Proceedings of the21th National Conference onArtificial Intelligence, Boston,2006.
    [279] Sutton R S, Szepesvari Cs, Maei H R. A convergent O(n) Algorithm for Off-PolicyTemporal-Difference Learning with Linear Function Approximation[C]. In:Proceedings of the22th Annual Conference on Neural Information ProcessingSystems, Vancouver,2008.
    [280] Sutton R S, Hamid R M, Precup D, and etc. Fast Gradient-Descent Methods forTemporal-Difference Learning with Linear Function Approximation[C]. In:Proceedings of the26th International Conference on Machine Learning, Montreal,2009.
    [281] Maei H R, Szepesvari Cs, Bhatnagar S, Sutton R S. Toward off-policy LearningControl with Function Approximation[C]. In: Proceedings of the27th InternationalConference on Machine Learning, Haifa,2010.
    [282]王雪松,张依阳,程玉虎.基于高斯过程分类器的连续空间强化学习[J].电子学报,2009,37(6):1153-1158.
    [283]高阳,周如益,王皓,曹志新.平均奖赏强化学习算法研究[J].计算机学报,2007,30(8):1372-1378.
    [284]石川,史忠植,王茂光.基于路径匹配的在线分层强化学习方法[J].计算机研究与发展,2008,45(9):1470-1476.
    [285]陶隽源,孙金玮,李德胜.基于线性平均的强化学习函数估计算法[J].吉林大学学报(工学版),2008,45(6):1407-1411.
    [286]赵昀,陈庆伟,胡维礼.一种基于信息熵的强化学习算法[J].系统工程与电子技术,2010,32(5):1043-1046
    [287]刘全,傅启明,龚声蓉,伏玉琛,崔志明.一种最小状态变元平均报酬的强化学习方法[J].通信学报,2011,32(1):66-71.
    [288] Dearden R, Friedman N, and Andre D. Model-based Bayesian Exploration[C]. In:Proceedings of the Fifteenth Conference on Uncertainty in Artificial Intelligence,Stockholm,1999.
    [289] Paduraru C, Kaplow R, Doina Precup and Joelle Pineau. Model-basedReinforcement Learning with State Aggregation[C]. In: Proceeding of the6thEuropean Workshop on Reinforcement Learning, Boston,2008.
    [290] Busoniu L, Ernst D, Schutter B De, Babuska R. Approximate DynamicProgramming with a Fuzzy Parametrization[J]. Automatica,2010,46(5):804-814.
    [291] Ramachandran D and Amir E. Bayesian Inverse Reinforcement Learning[C]. In:Proceeding of the20th International Joint Conference on Artificial Intelligence,Hyderabad,2007.
    [292] Waldock A, Carse B. Fuzzy Q-Learning with an Adaptive Representation[C]. In:Proceeding of2008IEEE International Conference on Fuzzy Systems, Sofia,2008.
    [293] Ming Liang X, WenBo X. Fuzzy Q-Learning in Continuous State and ActionSpace[J].The Journal of China Universities of Posts and Telecommunications,2010,14(17):100-109.
    [294] Beigi A, Parvin H, Mozayani N, Minaei B. Improving Reinforcement LearningAgents Using Genetic Algorithms[J]. Lecture Notes in Computer Science,2010,6335:330-337.
    [295] DongHyun L, VoVan Q, Sungho J, JuJang L. Online Support Vector Regressionbased Value Function Approximation for Reinforcement Learning[C]. In:Proceeding of the2009International Symposium of Industrial Electronics,Seoul,2009.
    [296]王雪松,田西兰,程玉虎,马小平.最小二乘支持向量机在强化学习系统中的应用[J].系统仿真学报,2008,20(14):3702-3706.
    [297]王雪松,田西兰,程玉虎,易建强.基于协同最小二乘支持向量机的Q学习[J].自动化学报,2009,35(2):214-219.
    [298]刘全,高阳,陈道蓄,孙吉贵,姚望舒.一种基于启发式轮廓表的逻辑强化学习方法[J].计算机研究与发展,2008,45(11):1824-1830.
    [299] Kehoe E J, Ludvig E A, Sutton R S. Magnitude and timing of conditionedresponses in delay and trace classical conditioning of the nictitating membraneresponse of the rabbit (oryctolagus cuniculus)[J]. Behavioral Neuroscience,2009,123(5):1095-1101.
    [300] Cutumisu M, Szafron D, Bowling M, Sutton R S. Agent learning usingaction-dependent learning rates in computer role-playing games [C]. In:Proceedings of the4th Artificial intelligence and Interactive Digital EntertainmentConference, Stanford,2008.
    [301] Bhatnagar S, Sutton R S, Ghavamzadeh M, et al. Natural actor-critic algorithms[J].Automatica,2009,45(11):2471-2482.
    [302] F Hoppe. Polya-like urns and the Ewing sampling formula[J]. Journal ofMathematical Biology,1984,20:91-94.
    [303] P Diaconis, D Freedman.(1980). Finite exchangeable sequences[J]. Annals ofProbability,1980,8(4):745-764.
    [304] G Grimmett, D Stirzaker. Probability and Random Processes[M]. OxfordUniversity Press,2001.
    [305] http://en.wikipedia.org/wiki/Impact_factor.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700