近年以来,越来越多的研究表明RNA在生命过程中发挥着非常重要的作用。RNA不仅是具有生物细胞结构的遗传讯息的中间载体,还具有基因表达调控、催化mRNA的剪接、加工和修饰RNA前体等其它重要功能。因此,对RNA分子的研究一直是生物信息学中的一个重要领域。而不同RNA所具有的功能与RNA的分子结构却有着密切的关系,为了更进一步的探索其更多的功能,就需要借助于RNA的二级结构。因为RNA分子自身所具有的难以结晶、降解速度快等特点,所以通过核磁共振(Nuclear magnetic resonance)或者X-射线晶体衍射和其他常规的实验方法预测RNA三维结构的费用高,耗时长。尽管通过常规的方法来确定RNA结构可以更加精确和可信,但是面对代价昂贵以及当前的海量数据,显然是满足不了需求的。所以,利用计算机实现的各种算法和数学方法来预测RNA二级结构成为公认的主要方法。
In recent years, more and more research shows that RNA plays a very importantrole in the life process. The RNA molecules are not only the carriers of geneticinformation in living cells,but also has some other important functions,such asregulating the gene expression, catalysising mRNA splice,processing and modifyingthe precursors of RNA and so on. So the research work about the RNA molecules isalways one of the important fields in the bioinformatics. The function of RNAmolecules has very close relation with their stuctures. In order to make furtherexploration, we need with the aid of RNA secondary structure.The most accuratemethod can be use by X-ray diffraction or nuclear magnetic resonance, but this isdifficult because not only it is expensive and slow but also most RNA molecules cannot be crystallized currently. Therefor, the recognized main method is by usingcomputer to realize all kinds of algorithms.
     In this paper, we study the methods of the RNA secondary structure prediction indepth. They include: the methods based on thermodynamic energy minimizationprinciple (such as Zuker’s mfold mehod, base pair maximization algorithm el.), themethod of Comparative sequence analysis (such as covariance mutation predictionmodel, stochastic context free grammar algorithm), the heuristic algorithm (geneticalgorithm, Simulated Annealing) and so on. Through the research of those methods,wesum up their respective advantages and disadvantages,and found the research idea ofthe new prediction method,which has laid a solid theoretical foundation for thecompletion of the paper work.
     Firstly, we use least squares support vector machine (LS-SVM) on the basis ofprincipal component analysis to predict tRNA. Due to the equality constraints in theformulation, a set of linear equations has to be solved instead of a quadraticprogramming problem. Compared with the traditional support vector machine (SVM),LS-SVM converted the inequality constraints into equality ones and made the trainingof the SVM equivalent to solving a group of equalities.Principal component analysis (PCA) is commonly used for feature extraction from high dimensional data. We usedthis approach to analyze the statistical features of nucleotide sequence, and then use theLS-SVM to predict the ncRNA.The results indicate that the proposed method isadoptable for prokaryotic ncRNA prediction.
     Secondly, we propose PSOfold, a particle swarm optimization for RNA secondarystructure prediction, to improve the performance of the recently published IPSO. Toenhance the searching ability of optimal solution, fuzzy logic control is applied toadaptively adjust the PSO parameters, which are inertia weight, learning factors and thenumber of ants, respectively.
     Finally, to further settle the stem permutation problem, we put forward a solutionconversion strategy to transform the discrete values of stems into an ordered stemcombination. The experimental results show that our method is effective for RNAfolding in terms of sensitivity, specificity and F-measure by comparing with othermethods based on evolutionary algorithms and swarm intelligence algorithms.
