生物磁共振数据分析中的几个问题
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
核磁共振(NMR)在物理、化学、生物、医学等多个科学领域都有了广泛的应用。生物磁共振技术为分子水平、细胞水平及整体水平的生命科学研究提供方法学支撑,包括蛋白质结构和动力学分析,代谢组学分析等。本文主要开展数学与生物磁共振的交叉研究,针对代谢组学以及蛋白动态组学两方面产生的核磁实验数据和实验现象,开展数据分析和数学建模的工作。
     本文分为六章,主要围绕数据分析和数学建模理论及其在生物磁共振中的应用进行研究。第一章简要介绍本文所需的生物磁共振数据分析方面的背景知识。
     在第二章中,研究了Renyi相对熵的中心极限定理及收敛速度。通过估计零均值的独立同分布的规范和与正态分布的Renyi相对熵,证明了与α(0<α<1)阶的Renyi相对熵有关的中心极限定理,得到独立同分布随机变量规范和Zn与标准正态分布随机变量G的Renyi相对熵的精确收敛速度,在此基础上给出一个检验残差序列是否接近零均值独立同分布的方法,为模型选择和模型诊断提供了理论基础。
     在第三章中,建立了代谢组学中一维谱数据的归一化新方法——聚类部分和归一化(CPIN, Clustering Partial Integral Normalization)。归一化主要是找一个合理的参考标准来衡量代谢物的变化。我们用层次聚类方法得到可能的参考组,通过平衡每个参考组的相似性与一致性,并采用OPLS提高其一致性。我们详细论述了聚类部分和归一化方法的流程及合理性,用两组数据展示其有效性。
     在第四章中,探讨了代谢组学一维谱数据的降维可视化,利用核方法把常规的线性降维方法拓广到了相应的非线性降维方法。首先给出了核磁共振数据降维中常用的一些线性降维方法(如PCA, LDA, PLS, OPLS)的严格数学推导过程;结合核技巧把上述PLS及OPLS线性降维方法拓广到经验核空间;并利用一组实际NMR数据展示上述方法的降维效果和核函数的参数设置对分类降维效果的影响。
     在第五章中,研究与生物大分子动态学磁共振实验有关的数学建模。生物大分子动态学关注蛋白质的瞬态结构及其功能,我们针对大肠杆菌的糖类磷酸转移酶系统,利用常微分方程组建立了相关蛋白质的动力学模型,阐述了蛋白与蛋白的弱相互作用以及二元磷酸基传递路径等新的核磁实验发现背后潜在的生物机制。建立一个简单的基元反应模型说明了解离常数与磷酸基传递效率之间的关系;对转运系统建立了包含二元通路的数学模型,计算了三元通路和二元通路的磷酸基转运效率。
     在第六章中,总结了全文的工作并提出了有待解决的问题。
Nuclear magnetic resonance (NMR) is widely used in physics, chemistry, biology, medicine, and other scientific fields. Biological NMR provides methodological support for life science re-search in molecular level, cellular level and overall level, especially in protein structure&dynam-ics and metabonomics. We concentrate on the interdisciplinary field:bio-NMR data analysis and mathematical modeling. The experimental data and phenomenon are provided by our collabora-tors in Wuhan Magnetic Resonance Center.
     The thesis consists of six parts. The first chapter introduces backgrounds on data analysis and bio-NMR related to our work.
     In the second chapter, we prove a central-limit theorem of order a(0<α<1) Renyi condi-tional entropy and obtain sharp rate of convergence. By carefully analyzing the Renyi conditional entropy between the distribution of the normalized sum of iid random variables and Gaussian dis-tribution, we show the central-limit theorem related to α(0<α<1) order Renyi conditional entropy, and obtain sharp convergence rate. Such a rate of convergence is used to model selection and model diagnosis.
     In the third chapter, we propose a new method for the normalization of metabolomics in one-dimensional spectral data-CPIN(clustering partial integral normalization). The key idea of normalization is to select a group of bins as a reference to show the variations of metabolites. We uses the hierarchical clustering to obtain candidate groups, balance the trade off between similarity and diversity, and improve the consistency by OPLS. The procedure and the rationality of CPIN are described in detail. The validity of CPIN is demonstrated by two groups of samples of1H spectrum.
     Chapter four discusses the dimension reduction and visualization of the NMR spectrum of metabolites. We generalize conventional linear dimensionality reduction method to the appropri-ate nonlinear dimension reduction method by using kernel methods. We give the rigorous mathe-matical derivation of NMR data dimensionality reduction methods widely used in metabonomics (such as PCA, LDA, PLS, OPLS), then extend PLS and OPLS by using kernel methods to ker-nel space. We use real NMR data of metabolites to show the validity of the proposed nonlinear dimension reduction method.
     The fifth chapter depicts the mathematical modeling work in dynamics of biological macro-molecules with magnetic resonance experiments. For E. coli sugar phosphotransferase system. We establish a dynamic model of the protein using ordinary differential equations, elaborate the weak interaction of proteins. The model grasps the underlying biological mechanisms from new NMR experiments. Specifically, it explains the relationship between Phosphate group transfer efficiency and Dissociation constant through a simple reaction model. It also shows the meaning among proteins weak interaction; further transfer of binary systems containing a mathematical path model, then establishing mathematical model including binary channel to the transporting system, it could predict the Phosphate group transfer efficiency of2-pathway and3-pathway.
     We summarize current works and some problems for further research in chapter six.
引文
[1]Artstein S, Ball K M, Barthe F, et al. Solution of Shannon's problem on the monotonicity of entropy. J. Amer. Math. Soc. 17:975-982. (2004).
    [2]Artstein S, Ball K M, Barthe F, et al. On the rate of convergence in the entropic central limit theorem. Probab. Theory Relat. Fields. 129:381-390. (2004).
    [3]Aubrun G, Szarek S, Werner E, Nonadditivity of Renyi entropy and Dvoretzky's theorem. J. Math. Phys. 51,022102, (2001).
    [4]Barron A R, Entropy and the central limit theorem. Ann. Probab. 14:336-342. (1986).
    [5]Bhattacharya R N, Ranga Rao R, Normal approximation and asymptotic expansion. John Wiley & Sons, Inc. (1976).
    [6]Bobkov S G, Chistyakov G P, Gotze F, Rate of convergence and edgeworth-type expansion in the entropic central limit theorem. Ann. Probab. 41:2479-2512.(2013)
    [7]Erven V T, Harrenmoes P, Renyi, divergence and Kullback-Leibler divergence. (arXiv:1206.2459, 2012).
    [8]Johnson O, Information theory and the central limit theorem. (Imperical College Press, 2004).
    [9]Johnson O, Barron A, Fisher Information inequalities and the central limit theorem. Probab. Theory Relat. Fields. 129:391-409. (2004).
    [10]Johnson O, Vignat C, Some results concerning maximum Renyi entropy distributions. Ann. Inst. H. PoincarRenyi Probab. Statist. 43:339-351. (2007).
    [11]Linnik J V, An information theoretic proof of the central limit theorem with Lindeberg conditions. Theory Probab. Appl.4:288-299. (1959).
    [12]Lutwak E, Yang D, Zhang G, Cramer-Rao and moment-entropy inequalities for Renyi entropy and gen-eralized Fisher information. IEEE Trans. Inform. Theory, 51:473-478. (2005).
    [13]Madiman M, Barron A R, Generalized entropy power inequalities and monotonicity properties of infor-mation. IEEE Transactions on Information Theory. 53:2317-2329. (2007).
    [14]Petrov V V, Sums of independent random variables. (Springer-Verlag, 206-206, 1975).
    [15]Renyi A, On measures of information and entropy. The 4th Berkeley Symposium on Mathematics Statis-tics and Probability. 547-561. (1960)
    [16]Shannon C E, Weaver W W, A mathematical theory of communication. (Urbana, IL: University of Illinois Press, 1949).
    [17]Tulino A M, Verdu S, Monotonic decrease of the non-Gaussianness of the sum of independent random variables: a simple proof.IEEE Trans. Information Theory. 52:4295-4297. (2006)
    [18]F. Dieterle, A. Ross, G. Schlotterbeck and H. Senn. Probabilistic Quotient Normalization as Robust Method to Account for Dilution of Complex Biological Mixtures. Application in 1HNMR Metabonomics. Anal. Chem.78:4281-4290. (2006).
    [19]P. Jatlow, S. Mckee and S. S. O'Malley, Correction of Urine Cotinine Concentrations for Creatinine Excretion: Is It Useful.Clin. Chem. 49:1932-1934. (2003).
    [20]G. Fauler, H. J. Leis, E. Huber, C. Schellauf, R. Kerbl, C. Urban and H. Gleispach, Determination of homovanillic acid and vanillylmandelic acid in neuroblastoma screening by stable isotope dilution GC-MS.J. Mass Spectrom.32(5):507-514. (1997).
    [21]A. Craig, O. Cloarec, E. Holemes, J. K. Nicholson and J. C. Lindon, Scaling and normalization effects in NMR spectroscopic metabonomic data sets. Anal. Chem. 78:2262-2267. (2006).
    [22]J. P. Shochcor and E. Holmes, Metabonomic applications in toxicity screening and disease diagnosis. Curr Top Med Chem.2(1):35-51.
    [23]O. Beckonert, E. Bollard M, T. M. D. Ebbels, H. C. Keun, H. Antti, E. Holmes, J. C. Lindon and J. K. Nicholson, NMR-based metabonomic toxicity classification: hierarchical cluster analysis and k-nearest-neighbour approaches. Analytica Chimica Acta. 490(1):3-5. (2003).
    [24]A. Ranalli, M. L. Ferrante, G. De Mattia, and N. Costantini, Analytical Evaluation of Virgin Olive Oil of First and Second Extraction. J. Agric. Food Chem.47:417-424. (1999).
    [25]G. Fragaki, A. Spyros, G. Siragakis, E. Salivaras and P. Dais, Detection of Extra Virgin Olive Oil Adulter-ation with Lampante Olive Oil and Refined Olive Oil Using Nuclear Magnetic Resonance Spectroscopy and Multivariate Statistical Analysis. J. Agric. Food Chem. 53:2810-2816. (2005).
    [26]G. K. Pierens, M. E. Palframan, C. J. Tranter, A. R. Carroll and R. J. Quinn, A robust clustering approach for NMR spectra of natural product extracts. Magn. Reson. Chem. 43:359-365. (2005).
    [27]M. E. Dumas, C. Canlet, F. Andre, J. Vercauteren, and A. Paris, Metabonomic assessment of physiologi-cal disruptions using 1H-13C HMBC-NMR spectroscopy combined with pattern recognition procedures performed on filtered variables. Anal. Chem. 74:2261-73. (2002).
    [28]T. Hastie, R. Tibshirani and J. Friedman. The elements of statistical learning. Springer, 472-479. (2001).
    [29]J. C. Lindon, J. K. Nicholson and E. Holmes, The handbook of metabonomics and metabolomics, Elsevier BV. (2007).
    [30]J. F. Wu, W. X Xu, Z. P. Ming, H. F Dong, H. R Tang and Y. L Wang, Metabolic Changes Reveal the Development of Schistosomiasis in Mice, PLoS Negl Trop Dis. 4(8):e807. (2010).
    [31]S. Theodoridis and K. Koutroumbas, Pattern regcognition. (Publishing House of Electronics Industry, 2012).
    [32]P. J. Bickel and E. Levina, Some theory for Fisher's linear discriminant function, "naive bayes ", and some alternatives when there are many more variables than observations. Bernoulli 10:989-1010. (2004).
    [33]J. Fan and Y. Fan, Hihg-dimensional classification using features annealed independence rules. Ann. Statist 36:2605-2637. (2008).
    [34]T. E. Bellman, Adaptive control processes. Princeton. (Princeton University Press,1961).
    [35]L. Maaten, E.O. Postma and H.J. Herik, Dimensionality reduction: a comparative review. (Tilburg Uni-versity, 2008).
    [36]陈晓红,数据降维的广义相关分析研究.[学位论文],南京航空航天大学(2011)
    [37]P. Hall, Y. Pittelkow and M. Ghosh, Theoretical measures of relative performance of classifiers for high dimensional data with small sample sizes. J. R. Statist. Soc. B 70:159-173. (2008).
    [38]H. Wold, Estimation of principal components and related models by iterative least squares. In Krishna-iaah, P.R. Multivariate Analysis. (New York:Academic Press, 391-420.1966).
    [39]S. Wold, M. Sjostrom and L. Eriksson, PLS-regression: a basic tool of chemometrics. Chemometrics and Intelligent Laboratory Systems 58 (2):109-130. (2001).
    [40]P.Geladi and B.Kowalski, Partial least-squares regression: a tutorial Analytica Chimica Acta. 185:1-17. (1986).
    [41]王惠文,偏最小二乘回归方法及其应用.(北京,国防工业出版社,1999).
    [42]R. Fisher, The use of multiple measurements in taxonomic problems. Annals of Eugenics. 7 (2):179-188. (1936).
    [43]H. Hotelling, Relations Between Two Sets of Variates. Biometrika 28 (3-4):321-377. (1936).
    [44]D. R. Hardoon, S. Szedmak and J. S. Tayloy, Canonical correlation analysis: an overview with applica-tion to learning method. Neural Computation. 16:2639-2664. (2004).
    [45]K. Pearson, On lines and planes of closest fit to systems of points in space. Phiosophical Magazine Series 6.2(11):559-572.(1901).
    [46]H. Hotelling, Aanlysis of a complex of statistical variables into principal components. Journal of Educa-tional Psychology. 24:417-441. (1933).
    [47]X. He and P. Niyogi, Locality preservingprojections. Proc. of Advances in Neural Information Processing Systems 16:153-160. (2003).
    [48]J. M. Fonvillea, M. Bylesjob,..., M. RantaIainenc, Non-linear modeling of 1HNMR metabonomic data us-ing kernel-based orthogonal projections to latent structures optimized by simulated annealing. Analytica Chimica Acta 705:72-80. (2011).
    [49]M. Bylesjjo, M. Rantalainen,..., J. Trygg. K-OPLS package: Kernel-based orthogonal projections to latent structures for prediction and interpretation in feature space. BMC Bioinformatics 9 106 (2008).
    [50]D. L. Swets and J. Weng, Using Discriminant Eigenfeatures for Image Retrieval. IEEE Trans. Pattern Analysis and Machine Intelligence 18(8):831-836. (1996).
    [51]S. Mika, G. Ratsch, J. Weston B. Scholkopf and K. R Muller, Fisher discriminant analysis with kernels. Neural Networks for Signal Processing.41-48 (1999)
    [52]J. Trygg and S. Wold, Orthogonal projections to latent structures (O-PLS). J. Chemometrics.16:119-128. (2002).
    [53]J. Trygg, O2-PLSfor qualitative and quantitative analysis in multivariate calibration. J. Chemometrics. 16:283-293. (2002).
    [54]J. Trygg and S. Wold, O2-PLS, a two-block (X-Y) latent variable regression (LVR) method with an integral OSC filter. J. Chemometrics. 17:53-64. (2003).
    [55]邓乃扬,田英杰,数据挖据中的新方法——支持向量机.(北京,科学出版社,2004).
    [56]B. Scholkopf, A. Smola and K. R. Muller. Nonlinear Component Analysis as a Kernel Eigenvalue Prob-lem. Neural Computation. 10(5):1299-1319. (1998).
    [57]J. Mercer, Function of positive and negative type and their connection with the theory of integral equa-tions. Philosophical Transactions of the Royal Society A.209 (441-458):415-446. (1909).
    [58]T. Hunter, Signaling-2000 and beyond. Cell.100:113-127. (2000).
    [59]姜铮,王芳,何湘等,蛋白质磷酸修饰的研究进展.生物技术通讯.20(2):233-237.(2009).
    [60]W. Kundig, S. Ghosh, S. Roseman, Phosphate bound to histidine in a protein as an intermediate in a novel phospho-transferase system. Proc Natl Acad Sci USA. 52(4):1067-1074. (1964).
    [61]P. W. Postma, J. W. Lengeler and G. R. Jacobson, Phosphoenolpyruvate: carbohydrate phosphotrans-ferase systems of bacteria. Microbiol Rev. 57(3):543-594. (1993).
    [62]马婉晴,章珍,刘悦琳,王华忠,大肠杆菌分解代谢产物阻遏效应研究进展.遗传.32(6):571-576.(2010).
    [63]J. M. Rohwer, N. D. Meadow, S. Roseman, H. V. Westerhoff, and P. W. Postma, Understanding Glucose Transport by the Bacterial Phosphoenolpyruvate: Glycose Phosphotransferase System on the Basis of Kinetic Measurements in Vitro. J Biol Chem. 275(45):34909-34921. (2000).
    [64]刘主,唐淳,顺磁弛豫增强技术与蛋白质瞬态结构.波普学杂志.28(3):301-316.(2011).
    [65]D. W. Saffen, K. A. Presper, T. L. Docring and S. Roseman, Sugar transport by the bacterial phos-photransferase system. Molecular cloning and structural analysis of Escherichia coli ptsH, ptsI, and crr genes. J. Biol. Chem. 262:16241-16253. (1987).
    [66]B. Magasanik, Catabolite repression. Cold Spring Harbor Symp Quant Biol. 26:249-256. (1961).
    [67]J. Deutscher, The mechanisms of carbon catabolite repression in beactia. Curr Opin Microbiol. 11(2):87-93. (2008).
    [68]M. Sondej, J. Z. Sun, Y. J. Seok, H. R. Kaback and A. Peterkofsky, Deduction of consensus binding sequences on proteins that bind AIIGlc of the phosphoenolpyruvate:sugar phos-photransferase sys-tem by cysteine scanning mutagenesis of Escherichia coli lactose permease. Proc Natl Acad Sci.96(7): 3525-3530. (1999).

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700