化学数据挖掘新算法和定量构性关系基础研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
化学数据挖掘正逐渐引起化学家们的关注。为了有效地挖掘色谱保留指数数据
    中有关不同化合物保留行为的差异,收集了近50 000条保留指数数据建立了保留指
    数数据库。同时讨论了建立及使用数据库所遇到的关于数据的查错和纠错、保留指
    数的温度校正和实验误差估计等问题。本文利用投影寻踪方法对拓扑指数-保留指数
    关系研究所涉及的数据进行数据挖掘,构建了一个投影寻踪算法。通过对烷烃、烯
    烃和环烷烃的投影寻踪,发现不同结构的化合物彼此可以按照分子中碳原子数目、
    分支数目、双键数目、双键位置、共轭与否、环数目及环上分支等分为不同的类别。
    利用这些已发现的分类信息,对不同类别的化合物建立不同的拓扑指数-保留指数和
    拓扑指数-沸点关系模型。对于烷烃化合物所建模型的标准误差已接近或达到了实验
    误差水平,并且有较高的预测能力。另外,当用一种同系物系列中的化合物构建投
    影方向时,能得到一个针对同系物的分类,并由此提出了类距离变量,用类距离变
    量可以建立非常优良的构性关系模型。
    利用拓扑指数间的正交化方法,并考虑性能,提出了拓扑指数的相似性评价指
    数和差异性评价指数,用来定量地考察拓扑指数之间的相关性和每一种拓扑指数对
    回归的贡献。计算结果表明它们可以比较合理地描述变量之间的关系,并且对定量
    构性关系研究中的变量选择也有指导意义。本文提出了块变量的概念,即几个定义
    相近的一类结构描述符组合在一起形成为一个块变量。通过对一组拓扑指数进行分
    块、正交化和用典型相关分析方法将正交化的块变量降维到一维等变换,得到一组
    保持着原变量绝大部分信息的新变量,变量数目大大降低。结果发现此方法很大程
    度上提高了构性关系模型的拟合和预测能力。
    复杂样品的色谱分析往往是一个部分组分已知,部分组分未知的灰色分析体
    系。本文提出了计算灰色分析体系死时间和正构烷烃保留时间的模型和算法,并利
    用文献上保存的大量保留指数数据对未知组分进行定性。通过对两个石油产品色谱
    分析例子的应用,发现该算法计算的死时间与实验结果非常接近,而且计算的正构
    烷烃保留时间和未知组分保留指数也与实验测定结果十分吻合。
Work in this paper focuses on the data mining from chromatographic retention index data. A retention index database that contains about 50 000 records of retention index is firstly established. Projection pursuit technique is then utilized to do data mining upon the data in order to find out some valuable information about the relationship between the retention indices and structural descriptors. A novel algorithm for projection pursuit is developed in this work. Samples of alkane, alkene and cycloalkane are investigated. Some interesting classifications based on special chemical structures, such as different numbers of carbon atoms in molecules, different numbers of branches, double bonds numbers, position of double bonds, conjugated double bonds or nonconjugated double bonds and numbers of rings etc., have been revealed for these carbonhydrogen compounds with the help of the new algorithm. Different models between topological indices and retention indices are established for different classes of samples ob
    tained from the results of projection. The regression is then significantly improved. This fact shows that there are really several linear models even for alkanes. Furthermore, an interesting projection result is obtained by projection pursuit when compounds in a homologous series are used to calculate the projection direction. This kind of classification shows that all homologous series are seperated each other and have regular distance between each other. Based on this information a new variable called class distance variable is proposed to describe the difference between the classes of homologs. With the help of this variable, a much better model is obtained. Its estimation errors and prediction errors are all very small closing to the measurement error level.
    Two indices called similarity evaluation index and difference evaluation index are proposed in this work. They can be used to investigate the correlation between topological indices (TIs) quantitatively and also to estimate TIs' contribution to the regession model in QSPR. The application of these two indices on a data set including alkanes and alkenes shows that they can describe relationship between TIs with
    
    
    reasonable results, and they have potential useness in variable selection. Block descriptor that contains a series of individual TIs with similar defmations is proposed in this work. Followed by combining some individual topological indices into a few blocks, a set of new one-dimesional variables is obtained with the help of canonical correlation analysis without losing major information. With the help of the new variables, models including few variables are established to describe retention indices of alkanes and show improved performance with high correlation coefficient and small residuals.
    For the chromatographic analysis of complex multicomponent samples in analytical chemistry, some grey analytical systems are often encounted, in which some components are ascertained and others are unknowns. The model and algorithm of calculating dead time and retention times of n-alkanes in a grey analytical system are developed. By using the calculated dead time and retention times of n-alkanes, retention indices of unknown components can be calculated easily. Results obtained by this method for two samples of petroleum products show that the calculated results of dead time, retention times of n-alkanes and retention indices of unknown components are satisfactory with small errors, comparing with the experimental values.
引文
[1] 徐光宪.21世纪的化学是研究泛分子的科学.化学科学部基金成果报告会文集(庆祝国家 自然科学基金委员会成立十五周年(1986-2001) ),北京:2001,11:3-9.
    [2] Fayyad M U, Uthurusamy R. Data mining and knowledge discovery in databases (introduction to the special section). Communications of The ACM 1996, 39(11) : 24-26.
    [3] Glymour C, Madigan D, Pregibon D, Smyth P. Statistical inference and data mining. Communications of The ACM 1996, 39 (11) : 35-41.
    [4] Inmon W H. The data warehouse and data mining. Communications of the ACM 1996, 39 (11) : 49-50.
    [5] Fayyad M U, Haussler D, Stolorz P. Mining scientific data. Communications of the ACM. 1996, 39 (11) : 51-57.
    [6] Hand D J, Blunt G, Kelly M G, Adams N M. Data mining for fun and. profit. Stat Sci 2000, 15: 111-131.
    [7] 杨炳儒.知识工程与知识发现,北京:冶金工业出版社,2000
    [8] Agrawal R. Data mining: a performance perspective IEEE Transactions on Knowledge and Data Engineering, 1997, 5: 914-925
    [9] Fayyad M U. Knowledge discovery and data mining: towards a unifying framework. In: Proceedings of KDD-96, Menlo Park, CA: AAAI Press. 1996, 82-88
    [10] Fayyad M U. Advances in Knowledge discovery and data mining. Menlo Park, CA: AAAI Press. 1996.
    [11] Heckerman D. Bayesian networks for data mining. Data Mining & Knowledge Discovery, 1997, 1:79-119.
    [12] Glymour G. Statistical themes and lessons for data mining. Data Mining & Knowledge Discovery, 1997, 1: 11-28.
    [13] 肖利,王能斌.挖掘转移功能:一种新的数据挖掘技术.计算机研究与发展,1998,15: 902-906.
    [14] Houtsma M, Swami A. Set-oriented data mining in relatioal databases. Data & Knowledge Engineering, 1995, 17: 245-262.
    [15] Knorr E, Ng R. Finding aggregate proximity relationships and commonalities in spatial data mining. IEEE Transactions on Knowledge and Data Engineering, 1996, 8: 884-897.
    [16] Han J. Generalization-based data mining in object-oriented databases using an object cube model. Data & Knowledge Engineering, 1998, 25: 55-97.
    [17] Chen M S, Han J, Yu P S. Data Mining : An overview from a data base perspective, IEEE Transactions on Knowledge and Data Engineering, 1996, 8: 866-883.
    [18] Cios K J, Pedrycz W, Swiniaarski R. Data Mining Methods for Knowledge Discovery. Kluwer Academic Publishers, 1998.
    [19] http://www.hncs.com, 2002.
    [20] http://www.think.com, 2002.
    [21] http://www.datamindcorp.com, 2002.
    [22] http://www.software.ibm.com, 2002.
    [23] http://www.angoss.com, 2002.
    
    
    [24] 梁逸曾,俞汝勤.分析化学手册(第十分册)-化学计量学,北京:化学工业出版社,2000.
    [25] Lavine B K, Workman J. Chemometrics, Anal. Chem. 2002, 74: 2763-2770.
    [26] Buydens Lutgarde M C, Reijmers Theo H, Beckers Mischa LM, wehrens R. Molecular data-mining: a challenge for chemometrics. Chemometrics and intelligent laborary systems.1999,49: 121-133.
    [27] Liang Y Z, Gan F. Chemical knowledge discovery from mass spectral database. I. Isotope distribution and Beynon table, Anal. Chim. Acta 2001, 446:107-114.
    [28] Inselberg A. Visualization and data mining of high-dimensional data. Chemom. Intell. Lab. Syst. 2002, 60: 147-159.
    [29] Bryant C H, Rowe R C. Knowledge discovery in databases: application to chromatography, Trends Anal. Chem. 1998, 17: 18-24.
    [30] Debska B J, Guzowska-Swider B. Knowledge discovery in an Infrared Database. Computers Chem. 1997,21:51-59.
    [31] Hayward J, Buchan I. Benefits of data cartridge technology forhandling chemical information. Curr Opin Drug Disc Devel. 2000, 3: 306-309.
    [32] Wedin R. Visual data mining speeds drug discovery. Mod. Drug. Disc. 1999,2:39-47.
    [33] Bayada D M, Hamersma H, van Geerestein V J. Molecular diversity and representativity in chemical databases. J Chem. Inf. Comput. Sci. 1999,39: 1-10.
    [34] Xie D, Tropsha A, Schlick T. An efficient projection protocol for chemical databases: singular value decomposition combined with truncated-Newton minimization. J Chem. Inf. Comput. Sci. 2000,40:167-177.
    [35] Jiang J H, Wang J H, Liang Y Z, Yu R Q. A non-linear mapping-based generalized backpropagation network for unsupervised learning. J Chemomet. 1996,10:241-252.
    [36] Agrafiotis D K, Lobanov V S. Nonlinear mapping networks. J Chem. Inf. Comput. Sci. 2000, 40:1356-1362.
    [37] Su H, Che Z H, Wu J M, Li R. Classification mapping and its application on chemical systems. J Chem. Inf. Comput. Sci. 1999,39: 718-727.
    [38] Sheridan R P, Miller M D. A method for visualizing recurrent topological substructures in sets of active molecules. J Chem. Inf. Comput. Sci. 1998, 38: 915-924.
    [39] Roberts G, Myatt G J, Johnson W P, Cross K P, Blower P E Jr. Lead, scope: software for exploring large sets of screening data. J Chem. Inf. Comput. Sci. 2000,40:1302-1314.
    [40] Ghose A K, Viswanadhan V N, Wendoloski J J. A knowledge-based approach in designing combinatorial or medicinal chemistry libraries for drug discovery. 1. A qualitative and quantitative characterization of known drug databases. J Comb. Chem. 1999,1: 55-68.
    [41] Xu J, Stevenson J. Drug-like index: a new approach to measure drug-like compounds and their diversity. J Chem. Inf. Comput. Sci. 2000,40: 1177-1187.
    [42] Brown R D, Hassan M, Waldman M. Combinatorial library design for diversity, cost efficiency and drug-like character. J Mol. Graph. 2000,18: 427-437.
    [43] Oprea T I. Property distribution of drug-related chemical, databases. J Comput-Aided Mol. Design, 2000,14:251-264.
    [44] Harm M, Hudson B, Lewell X, Lifely R, Miller L, Ramsden N. Strategic pooling of compounds for high throughput screening. J Chem. Inf. Comput. Sci. 1999, 39: 897-902.
    [45] Ajay W, Walters W, Murcko M A. Can we learn to distinguish between 'drug-like' and
    
    'nondrug-like' molecules? J Med. Chem. 1998,41: 3314-3324.
    [46] Sadowski J, Kubinyi H. A scoring scheme for discriminating between drugs and nondrugs. J Med. Chem. 1998,41: 3325-3329.
    [47] Frimurer T M, Bywater R, Naerum L, Lauritsen L N, Brunak S. Improving the odds in discriminating 'drug-like' from 'non drug-like' compounds. J Chem. Inf. Comput. Sci. 2000, 40: 1315-1324.
    [48] Rusinko AⅢ, Farmen M W, Lambert C G, Brown P L, Young S S. Analysis of a large structure/biological activity data set using recursive partitioning. J Chem. Inf. Comput. Sci. 1999, 39:1017-1026.
    [49] Chen X, Rusinko A Ⅲ, Young S S. Recursive partitioning analysis of a large structure-activity data set using three-dimensional descriptors. J Chem. Inf. Comput. Sci. 1998,38:1054-1062.
    [50] Chen X, Rusinko A Ⅲ, Tropsha A, Young S S. Automated pharmacophore identification for large chemical datasets. J Chem. Inf. Comput. Sci. 1999, 39: 887-896.
    [51] Cho S J, Shen C F, Hermsmeier M A. Binary formal inference-based recursive modelling using multiple atom and physicochemical property class pair and torsion descriptors as decision criteria. J Chem. Inf. Comput. Sci. 2000,40: 668-680.
    [52] Miller D W. Results of a new classification algorithm combining. k-nearest neighbours and recursive partitioning. J Chem. Inf. Comput. Sci. 2001,41: 168-175.
    [53] Mello K L, Brown S D. Novel 'hybrid' classification method using Bayesian networks. J Chemomet 1999, 13: 579-590.
    [54] Jones-Hertzog D K, Mukhopadhyay P, Keefer C E, Young S S. Use of recursive partitioning in the sequential screening of G-protein-coupled receptors. J Pharmacol Toxicol Methods 1999, 42: 207-215.
    [55] Stanton D T, Morris T W, Roychoudhury S, Parker C N. Application of nearest-neighbour and cluster analyses in pharmaceutical lead discovery. J Chem. Inf. Comput. Sci 1999, 39: 21-27.
    [56] Engels M F M, Thielemans T, Verbinnen D, Tollenaere J P, Verbeek R. CerBeruS: a system supporting the sequential screening process. J Chem. Inf. Comput. Sci. 2000,40: 241-245.
    [57] Lepre C A. Library design for NMR-based screening. Drug Discov Today 2001,6: 133-140.
    [58] Ajay W, Bemis G W, Murcko M A. Designing libraries with CNS activity. J Med. Chem. 1999, 42:4942-4951.
    [59] Izrailev S, Agrafiotis D K. A novel method for building regression tree models for QSAR based on artificial ant colony systems. J Chem. Inf. Comput. Sci. 2001,41:176-180.
    [60] Shi L M, Fan Y, Lee J K, Waltham M, Andrews D T, Scherf U, Paull K D, Weinstein J N. Mining and visualizing large anticancer drug discovery databases. J Chem. Inf. Comput. Sci. 2000, 40:367-379.
    [61] Scherf U, Ross D T, Waltham M, Smith L H, Lee J K, Tanabe L,. Kohn K W, Reinhold W C, Myers TG, Andrews D T et al. A gene expression database for the molecular pharmacology of cancer. Nat Genet 2000, 24:236-244.
    [62] 李浩春.《分析化学手册-气相色谱分册》,第二版,北京:化学工业出版社,1999,356-691.
    [63] Kovats, E. Gas chromatographic characterization of organic compounds. I. Retention indexes of aliphatic, halides, alcohols, aldehydes, and ketones. Helv. Chim. Acta, 1958,41: 1915-1932.
    [64] Boneva S, Dimov N. Gas chromatographic retention indices for alkenes on ov-101 and squalane
    
    capillary columns. Chromatographia. 1986,21: 149-150.
    [65] Schomburg G. Gas chromatographische retentionsdaten und struktur chemischer verbindungen; Ⅲ. Alkylverzweigte und ungesattigte cyclische kohlenwasserstoffe. J. Chromatogr., 1966, 23: 18-41
    [66] Evans M B, Smith J F. Gas-liquid chromatography in quantitative analysis. Ⅷ. Evaluation of polymeric stationary phases with special reference to the poly(oxyethylene) glycols. J. Chromatogr., 1968, 36(4) : 489-503.
    [67] Wallaert B. Determination of Kovats indexes on a polar capillary. Bull Soc. Chim. (Fr.). 1971, 3: 1107-1109.
    [68] Tourres D A. Structure moleculaire et retention en chromatographie en phase gazeuse ; Influence de la temperature sur 1'indice de retention d'alcanes isometres. J. Chromatogr., 1967, 30: 357-377.
    [69] Sojak L. Capillary gas chromatographie analysis of seperation products of alkylaromatic Ce to Cio isomers. J. Burkhard, Ropa Uhlie, 1969, 11(7) : 361-372.
    [70] Cramers C A, Rijks J A, Pacakova V. The application of precision gas chromatography to the identification of types of hydrocarbons. J. Chromatogr., 1970, 51: 13-21.
    [71] Mitra G D, Saha N C. Determination of retention indices of saturated hydrocarbons by graphical methods. J. Chromatogr. Sci., 1970, 8(2) : 95-102.
    [72] Sojak L, Hrivnak J, Ostrovsky I, Janak J. Capillary gas chromatography of linear alkenes on squalane separation and identification of n-pentadecenes and n-hexadecenes. J. Chromatogr. 1974,91:613-622.
    [73] Castello G, D'Amato G. Use of linear and branched-chain paraffinic liquid phases as non-polar reference materials in gas chromatography. J. Chromatogr., 1979, 175: 27-35.
    [74] Kumar P, Sarowha S, Gupta P. Kovats retention indices of identified hydrocarbons using a squalane capillary column. Analyst, 1979, 104: 788-791.
    [75] Mlejnek O. Application of pyrolysis-capillary gas chromatography to the characterization of polyethylene.J. Chromatogr., 1980, 191: 181-186.
    [76] 李浩春,戴朝政,徐方宝,卢佩章.用微处理机计算和预测低沸点异构烷烃的保留指数,科 学通报,1981,26,1491. ;中国科学院大连化物所资料,1981.
    [77] Schroeder H. Retention indexes of hydrocarbons up to CM for the stationary phase squalane. HRC CC(High Resolut. Chromatogr. Chromatogr. Commun.), 1980, 3:38-44.
    [78] Sojak L, Kral'Ovicova E, Ostrovsky I, Leclercq P A. Retention behaviour of cinjugated and isolated n-alka-dienes. identification of n-nona-and n-decadienes by capillary gas chromatography using structure-retention correlations and mass spectrometry. J. Chromatogr. 1984,292:241-261.
    [79] Sojak L, Ostrovsky I, Leclercq P A, Rijks J A. Identification of n-hepta-and n-octadienes by high-resolution gas chromatography using structure-retention corelations and mass spectrometry. J. Chromatogr. 1980, 191: 187-198.
    [80] Sojak L, Hrivnak J, Majer P. Capillary Gas chromatography of linear alkenes on squalane. Anal. Chem. 1973,45: 293-302.
    [81] Zulaica J, Guiochon G. Analysis of high polymers and their pyrolysis products by gas chromatography. I. method of study. Bull Soc. Chim (Fr), 1966,1343-1351.
    
    
    [82] Tourres D A. Structural analysis of industrial butene dimers by gas chromatography, J. Gas Chromatogr. 1967, 5: 35-40.
    [83] Hively R A, Hinton R E. Variation of the retention index with temperature on squalane substrates, J. Gas Chromatogr. 1968,6: 203-17.
    [84] Loewenguth J C, Tourres D A. Etude des variations des indices de retention en fonction de la temperature. Z. Anal. Chem. 1968, 236:170.
    [85] Marukuma A. Retention indeces of alkanes through C10 and alkenes through C8 and relation between boiling points and retention data. Gas Chromatography 1968. , London: Inst. of Petroleum, 1969, 55.
    [86] Schomburg G, Henneberg D. Analysis of olefin mixtures by combination of capillary gas chromatograph and mass spectrometer. Gas Chromatography 1968. , London: Inst. of Petroleum, 1969, 45.
    [87] Sojak L. Open tubular column gas chromatography of dehydrogenation products of C6-C10 n-alkanes. Separation and identification of mixtures of C6-C10 straight-chain alkanes, alkenes, and aromatics, J. Chromatogr., 1970, 51: 75-82.
    [88] Eisen O, Orav A, Rang S. Identification of normal alkenes, cyclopentenes, and cyclohexenes by capillary gas chromatography, Chromatographia, 1972, 5: 229-39.
    [89] Rijks J A, Cramers C A. High precision capillary gas chromatography of hydrocarbons, Chromatographia, 1974, 7: 99.
    [90] Vaneertum R. On the retention index calculation according to Takacs, J. Chromatogr. Sci. 1975, 13: 150.
    [91] Chretiem J R, Dubois J E. Topological analysis of gas-liquid chromatographic behavior of alkenes, Anal. Chem. 1977,49: 747-56.
    [92] Pacakova V, Kozlik V. Capillary reaction gas chromatography. 1. Catalytic decomposition of hydrocarbons, Chromatographia, 1978,11: 266.
    [93] Welsch T, Engewald W. Molecular structure and retention behaviour.Ⅸ. Retention behaviour of isomeric octynes and octadiynes, Chromatographia 1978,11:5.
    [94] Dubois J E, Chretien J R, Sojak L, Rijks J A. Topological analysis of the behavior of linear alkenes up to tetradecenes in gas-liquid chromatography on squalane, J. Chromatogr. 1980, 194: 121-34.
    [95] Karche W, Devillers J. Practical applications of Quantitative Structure-Activtity Relationships (QSAR) in Environmental Chemistry and Toxicology. Dordrecht: Kluwr Academic Publishers, 1990.
    [96] Hansch C, Leo A. Exploring QSAR. Fundamentals and Applications in Chemistry and Biology. Washington DC: American Chemical Society, 1995.
    [97] 许禄.化学计量学方法,北京:科学出版社,1995.
    [98] Devillers J, Balaban A T. Topological Indices and Related Descriptors in QSAR and QSPR, The Netherlands: Gordon and Breach Science Publishers, 1999.
    [99] Wiener H. Structural Determination of Parafin Boiling Points. J. Amer. Chem. Soc. 1947, 69: 17-20.
    [100] Balaban A T, Ivanciuc O. Historical development of topological indices. Topological Indices and Related Descriptors in QSAR and QSPR (J. Devillers and A. T. Balaban, Eds.), Gordon and
    
    Breach Science Publishers: The Netherlands, 1999, 455-489.
    [101] Hosoya H A newly proposed quantity characterizing the topological nature of structural isomers of saturated hydrocarbonsl Bull. Chem. Soc. Jpn. 1971, 44: 2332-2339.
    [102] Balaban A T. Chemical graphs. XXXIV. Five new topological indexes for the branching of tree-like graphs. Theor. Chim. Acta 1979, 53: 355-75.
    [103] Schultz H P. Topological organic chemistry. 1. Graph theory and topological indices of alkanes. J. Chem. Inf. Comput. Sci. 1989, 29: 227-228.
    [104] Schultz H P, Schultz E B, Schultz T P. Topological organic chemistry. 2. Graph theory, matrix determinants and eigenvalues, and topological indices of alkanes. J. Chem. Inf. Comput. Sci. 1990, 30: 27-29.
    [105] Randic M. On characterization of molecular branching. J. Am. Chem. Soc. 1975, 97: 6609-6615.
    [106] Kier L B, Hall L H, Murray W J, Randic M. Molecular connectivity. Part 1. Relationship to nonspecific local anesthesia. J. Pharm. Sci. 1975, 64: 1971-1974.
    [107] Kier L B, Murray W J, Randic M, Hall L H. Molecular connectivity. Part 5. Connectivity series concept applied to densit. J. Pharm. Sci. 1976, 65: 1226-1230.
    [108] Kier L B, Hall L H. Molecular Connectivity in Chemistry and Drug Research, Academic Press: New York, 1976.
    [109] Kier L B, Hall L H. Molecular Connectivity in Structure-Activity Analysis. Research Studies Press, Letchworth, 1986
    [110] Bonchev D, Trinajstic N. Information theory, distance matrix, and molecular branching. J. Chem. Phys. 1977,67:4517-4533.
    [111] Kier L B, Hall L H. An electrotopological-state index for atoms in molecules. Pharm. Res. 1990, 7: 801-807.
    [112] Hall L H, Kier L B. Electrotopological-state indices for atom types: A novel combination of electroonic, topological, and valence state information. J. Chem. Inf. Comput. Sci. 1995, 35: 1039-1045.
    [113] Hall L H, Story C T. Boiling point and critical temperature of a heterogeneous data set: QSAR with atom type electrotopological state indices using artificial neural networks. J. Chem. Inf. Comput. Sci. 1996, 36: 1004-1014.
    [114] Lovasz L, Pelikan J. On the eigenvalues of trees. Period. Math. Hung. 1973, 3: 175-182.
    [115] Cvetkovic D M, Gutman I. Note on branching. Croat. Chem. Acta 1977, 49: 115-121.
    [116] Gutman I, Markovic S. Benzenoid graphs with maximum eigenvalues. J. Math. Chem. 1993, 13: 213-215.
    [117] Bertz S H. Branching in graphs and molecules. Discrete Appl. Math. 1988, 19: 41-70.
    [118] Balaban A T. Topological indices based on topological distances in molecular graphs. Pure Appl. Chem. 1983, 55: 199-206.
    [119] Basak S C. Use of molecular complexity indices in predictive pharmacology and toxicology: A QSAR approach. Med. Sci. Res. 1987, 15: 605-609.
    [120] Balaban A T. Highly discriminating distance-based topological index. Chem. Phys. Lett. 1982, 89: 399-404.
    [121] Bonchev D, Mekenyan O, Trinajstic N. Isomer discrimination by topological information approach. J. Compt. Chem. 1981, 2: 127-148.
    [122] Rucker G, Rucker C. Counts of all walks as atomic and molecular descriptors. J. Chem. Inf.
    
    Comput. Sci. 1993, 33: 683-695.
    [123] Yee W T, Sakamoto K, Ihaya Y J. Information theory of molecular properties. I. A theoretical study of the information content of organic molecules. Rept. Univ. Electrocomm. 1977, 27: 53-63.
    [124] Merrifield R E, Simmons Ⅲ H E. The structure of molecular topological spaces. Theor. Chim. Acta. 1980, 55: 55-75.
    [125] Merrifield R E, Simmons Ⅲ H E. Enumeration of structure-sensitive graphical subsets: Calculations. Proc. Natl. Acad. Sci. USA, 1981, 78: 692-695.
    [126] Balaban A T, Balaban T S. New vertex invariants and topological indices of chemical graphs based on information on distances. J. Math. Chem. 1991, 8: 383-397.
    [127] Balaban A T, Feroiu V. Correlations between structure and critical data or vapor pressures of alkanes by means of topological indices. Rep. Mol. Theor. 1990, 1: 133-139.
    [128] Balaban A T, Ciubotariu K, Medeleanu M. Topological indices and real number vertex invariants based on graph eigenvalues or eigenvectors. J. Chem. Inf. Comput. Sci. 1991, 31: 517-523.
    [129] Plavsic K, Nikolic S, Trinajstic N, Mihalic Z. On the Harary index for the characterization of chemical graphs. J. Math. Chem. 1993, 12: 235-250.
    [130] Klein D J, Randic M. Resistance distance. J. Math. Chem. 1993, 12: 81-95.
    [131] Balaban A T, Catana C. Search for nondegenerate real vertex invariants and derived topological indices, J. Comput. Chem. 1993, 14: 155-160.
    [132] Diudea M V. Layer matrices in molecular graphs. J. Chem. Inf. Comput. Sci. 1994, 34: 1064-1071.
    [133] Diudea M V. Cluj matrix invariants. J. Chem. Inf. Comput. Sci. 1997, 37: 300-305.
    [134] Gutman I, Klavzar S. An algorithm for the calculation of the Szeged index of benzenoid hydrocarbons. J. Chem. Inf. Comput. Sci. 1995, 35: 1011-1014.
    [135] Randic M. On molecular identification numbers. J. Chem. Inf. Comput. Sci. 1984, 24: 164-175.
    [136] Ivanciuc O, Balaban A T. Design of topological indices. Part 8. Path matrices and derived molecular graph invariants. MATCH (Commun. Math. Chem.), 1994, 30: 141-152.
    [137] Katritzky A R, Lobanov B, Karelson M. CODESSA, Comprehensive Descriptors for Structural and Statistical Analysis, Reference Manual version 2. 0; University of Florida, Gainesville, FL, 1994.
    [138] Motoc I, Balaban A T, Mekenyan O, Bonchev D. Topological indices: inter-relations and composition. MATCH (Commun. Math. Chem.) 1982, 13: 369-404.
    [139] Plavsic D, Lers N, Sertic-Bionda K. On the Relation between W □/W Index, Hyper-Wiener Index, and Wiener Number. J. Chem. Inf. Comput. Sci. 2000, 40: 516-519.
    [140] Chan O, Gutman I, Lam T, Merris R. Algebraic Connections between Topological Indices. J. Chem. Inf. Comput. Sci. 1998, 38: 62-65.
    [141] Estrada E. Novel strategies in the search of topological indices. Topological Indices and Related Descriptors in QSAR and QSPR (J. Devillers and A. T. Balaban, Eds.), Gordon and Breach Science Publishers: The Netherlands, 1999, 403-453.
    [142] Basak S C, Balaban A T, Grunwald G D, Gute B D. Topological Indices: Their Nature and Mutual Relatedness. J. Chem. Inf. Comput. Sci. 2000, 40: 891-898.
    [143] Taraviras S L, Ivanciuc O, Bass D C. Identification of Groupings of Graph Theoretical
    
    Molecular Descriptors Using a Hybrid Cluster Analysis Approach. J. Chem. Inf. Comput. Sci. 2000,40:1128-1146.
    [144] Randic M. Resolution of ambiguities in structure-property studies by use of orthogonal descriptors. J. Chem. Inf. Comput. Sci. 1991, 31: 311-320.
    [145] Friedman J H, Tukey J W. A Projection pursuit algorithm for exploratory data analysis. IEEE Trans. Computers, 1974, C-23: 881-889.
    [146] Friedman J H, Stuetzle W. Projection pursuit regression. J. Amer. Statist. Assoc. 1981, 76: 817-823.
    [147] Friedman J H, Stuetzle W. Projection pursuit classification. 1980,手稿.
    [148] Friedman J H, Stuetzle W, Schroeder A. Projection pursuit density estimation. J. Amer. Statist. Assoc. 1984, 79: 599-608.
    [149] Huber P J. Density estimation and projection pursit methods. 1981, ********.
    [150] Huber P J. Projection pursuit. Research Report PJH-6,Department of Statistics, Harvard University. 1981.
    [151] Huber P J. Projection pursuit (with discussion). Annal of Statistics, 1985, 13: 435-475.
    [152] Guo Q, Wu W, Questier F, Massart DL, Boucon C, De Jong S. Sequential projection pursuit using genetic algorithms for data mining of analytical data. Anal. Chem. 2000, 72: 2846-2855.
    [153] Schoonjans V, Massart D L. Combining spectroscopic data (MS, IR): exploratory chemometric analysis for characterising similarity/diversity of chemical strictures. J. Pharm. Biomed. Anal. 2001,26:225-239.
    [154] Detroyer A, Schoonjans V, Questier F, Heyden Y, Vander Borosy, A P, Guo Q, Massart D L. Exploratory chemometric analysis of the classification of pharmaceutical substances based on chromatographic data. J. Chromatog. A, 2000, 897: 23-36.
    [155] Bellman R E. Adaptive control processes, Princeton Univ, Press, Princeton, New York..
    [156] 梁逸曾.白灰黑复杂多组分分析体系及其化学计量学算法 长沙:湖南科学技术出版社, 1996.
    [157] 王连生,支正良,高松亭.分子结构与色谱保留,北京:化学工业出版社,北京,1994, 56-59.
    [158] Kovats E. Gas chromatogrphic characterization of organic substances in the retenton index system. Adv. Chromatogr., 1965, 1: 229-249.
    [159] Loewenguth J C, Tourres D A. Etude des variations des indices de retention en fonction de la temperature. Z. Anal. Chem., 1968,236: 170-190.
    [160] Mitra G D, Saha N C. Application to temperature coefficient of retention index in the gas chromatographic analysis of nephtha. Technology 1969, 6(2-3) : 119-125.
    [161] Robinson P G, Odell A L. Comparison of isothermal and non-linear temperature programmed gas chromatography the temperature dependence of the retention indices of a number of hydrocarbons on squalane and SE-30. J. Chromatogr., 1971, 57: 11-17.
    [162] Takacs J, Rockenbauer M, Olacsi I. Determination of the relationship between retention index and column temperature in gas chromatograpy through the temperature-dependence of the net retention volume. J. Chromatogr., 1969, 42: 19-28.
    [163] Molnar E B, Moritz P, Takacs J. Determination of the temperature-dependence of the retention
    
    index in gas-liquid chromatography by computer. J. Chromatogr., 1972, 66: 205-212.
    [164] Wold S, Antti H, Lindgren F, Ohman J. Orthogonal signal correction of near-infrared spectra. Chemom. Intell. Lab. Syst. 1998, 44: 175-185.
    [165] Fearn T. On orthogonal signal correction. Chemom. Intell. Lab. Syst. 2000, 50: 47-52.
    [166] Lucic B, Nikolic S, Trinajstic N, Juretic D. The structure-property models can be improved using the orthogonalized descriptors J. Chem. Inf. Comput. Sci. 1995, 35: 532-538.
    [167] Xu L, Zhang W J. Comparison of different methods for variable selection Anal. Chim. Acta 2001,446:477-483.
    [168] Du Y P, Liang Y Z, Wu C J. Database construction of GC retention index and correction of mistakes in it. Chinese Journal of Analytical Chemistry, 2002, 19(4) : 464-466.
    [169] Balaban A T, Diudea M V. Real Number Vertex Invariants: Regressive Distance. Sums and Related Topological Indices. J. Chem. Info. Comput. Sci. 1993, 33: 421-428.
    [170] Liu S S, Cai S X, Cao C Z, Li Z L. Molecular electronegative distance vector (MEDV) relating to 15 properties of alkanes. J. Chem. Inf. Comput. Sci. 2000, 40(6) : 1337-1348.
    [171] Draper N R. Smith, H. Applied Regression Analysis. John Wiley and Sons, New York, 1981.
    [172] Furnival G M, Wilson R W. Regressions by leaps and bounds. Technometrics 1974, 16: 499-511.
    [173] Goldberg D E. Genetic Algorithms in Search, Optimization & Machine Learning Addison-Wesley, New York, 1989.
    [174] Hibbert D B. Genetic algorithm in chemistry. Chemom. Intell. Lab. Syst. 1993, 19: 277-293.
    [175] Hasegawa K, Funatsu K. GA strategy for variable selection in QSAR studies: GAPLS and D-optimal designs for predictive QSAR model. J. Mol. Struct.(Theochem) 1998, 425: 255-262.
    [176] Wikel J H, Dow E R. The use of neural networks for variable selection in QSAR. Bioorg. Med. Chem. Lett. 1993, 3: 645-651.
    [177] Sutter J M, Dixon S L, Jurs P C. Automated descriptor selection for quantitative-structure-activity relationships using generalized simulated annealing, J. Chem. Inf. Comput. Sci. 1995, 35: 77-84.
    [178] Lucic B, Trinajstic N A. New Efficient Approach for Variable Selection Based on Multiregression: Prediction of Gas Chromatographic Retention Times and Response Factors. J. Chem. Inf. Comput. Sci. 1999, 39: 610-621.
    [179] Ivanciuc O, Taraviras S L, Cabrol-Bass D. Quasi-orthogonal Basis Sets of Molecular Graph Descriptors as a Chemical Diversity Measure. J. Chem. Inf. Comput. Sci. 2000, 40: 126-134.
    [180] Amic D, Davidovic-Amic D, Trinajstic N. Calculation of retention times of anthocyanins with orthogonalized topological indices J. Chem. Inf. Comput. Sci. 1995, 35: 136-139.
    [181] Soskic M, Plavsic D, Trinajstic N. 2-Difluoromethylthio-4,6-bis-(monoalkylamino)-1,3,5-triazines as inhibitors of Hill reaction: A QSAR study with orthogonal descriptors. J. Chem. Inf. Comput. Sci. 1996, 36: 829-832.
    [182] Mardia K V, Kent J, Bibby J. Multivariate Analysis New York: Academic Press, 1979.
    [183] Yao Y Y, Xu L, Yuan X S. A new topological index for research on structure-property relationship of alkanes. Chinese Acta Chimica Sinica, 1993, 51 : 463-469.
    [184] Hu C Y, Xu L. On highly discriminating molecular topological index. J. Chem. Inf. Comput. Sci. 1996, 36: 82-90.
    [185] Katritzky A R, Chen K, Maran U, Carison D A. QSPR correlation and predictions of GC
    
    retention indexes for methyl-branched hydrocarbons produced by insects. Anal. Chem. 2000, 72: 101-109.
    [186] Heberger K, Gorgenyi M. Principal component analysis of Kovats indices for carbonyl compounds in capillary gas chromatography. J. Chromatog. A, 1999, 845: 21-31.
    [187] Yan A X, Zhang R S, Liu M C, Hu Z D, Hooper M A, Zhao Z F. Large artificial neural networks applied to the prediction of retention indices of acyclic and cyclic alkanes, alkenes, alcohols, esters, ketones and ethers. Comput. Chem. 1998, 22: 405-412.
    [188] Sutter J M, Peterson T A, Jurs P C. Prediction of gas chromatographic retention indices of alkylbenzenes. Anal. Chim. Acta, 1997, 342: 113-122.
    [189] Yin C S, Liu W, Li Z L, Pan Z X, Lin T, Zhang M S. Chemometrics to chemical modeling: structural coding in hydrocarbons and retention indices of gas chromatography. J. Sep. Sci., 2001, 24(3) : 213-220.
    [190] Buja A, Duffy D, Hastie T, Tibshirani R. in Discussion of "Multivariate adaptive regression splines", The Annals of Statistics, 1991, 19: 93-98.
    [191] Rucker G, Rucker C. On Topological Indices, Boiling Points, and Cycloalkanes J. Chem. Inf. Comput. Sci. 1999, 39: 788-802.
    [192] Woloszyn T F, Jurs P C. Prediction of gas chromatographic retention data for hydrocarbons from naphthas. Anal. Chem. 1993, 65: 582-587.
    [193] Pomoe M, Novic M. Prediction of Gas-Chromatographic Retention Indices Using Topological Descriptors. J. Chem. Inf. Comput. Sci., 1999, 39: 59-67.
    [194] Balaban A T. Applications of graph theory in chemistry. J. Chem. Inf. Comput. Sci., 1985, 25: 334-343.
    [195] Sojak L, Ostrovsky I, Kubinec R, Kraus G, Kraus A. High-resolution gas chromatography with liquid crystal glass capillaries xi. separation of isomeric Cg and C9 hydrocarbons. J. Chromatogr. 1990, 509: 93-99.
    [196] Sojak L, Ostrovsky I, Janak J. Propyl Effect and retention-structure correlation as a means of gas chromatographic identification. J. Chromatogr. 1987, 406: 43-49.
    [197] Sojak L, Ostrovsky I, Kubinec R, Kraus G, Kraus A. Separation and identification of all isomeric n-nonadecenes by capillary gas chromatography on a mesogenic stationary phase with Fourier transform infrared and mass spectrometric detection. J. Chromatogr. 1991, 609: 283-288.
    [198] Sojak L, Krupcik J, Janak J. Gas chromatography of all C15-C18 linear alkenes on capillary columns with very high resolution power. J. Chromatogr. 1980, 195: 43-64.
    [199] Sojak L, Ostrovsky I, Farkas P, Janak J. High-resolution gas chromatography with liquid crystal glass capillaries.ix. separation of isomeric C9-C11 n-alkenes and n-alkankes. J. Chromatogr. 1986,356:105-114.
    [200] Sojak L, Ostrovsky I, Kubinec R, Kraus G, Kraus A. High-resolution gas chromatography with liquid crystal glass capillaries.xii. separation of isomeric C17-C18 n-alkenes. J. Chromatogr. 1990, 520: 75-83.
    [201] Sojak L, Xrupcik J, Tesarik L, Janak J. Correlation of the boiling points of non-branched C6 and C10 Olefins with the gas chromatographic retention indices. J. Chromatogr. 1972, 65: 93-102.
    [202] Sojak L, Majer P, Skalak P, Janak J. Identification of straight-chain undecenes by capillary gas
    
    chromatography on squalane. J. Chromatogr. 1972,65: 137-142.
    [203] Rohrbaugh R H, Jurs P C. Prediction of gas chromatographic retention indexes of selected olefins. Anal. Chem. 1985, 57: 2770-2773.
    [204] Kaliszan R. Quantitative structure-chromatographic retention relationships. Wiley, New York, 1987.
    [205] James A T, Martin A J P. Gas-liquid partition chromatography: the separation and microestimation of volatile fatty acids from formic acid to dodecanoic acid. J. Biochem., 1952, 50: 679-690.
    [206] Ray N H. Gas chromatography. I. The seperation and estimation of volatile organic compounds by gas-liquid partition chromatography. J. Appl. Chem., 1954,4: 21-25.
    [207] Maeder M. Evolving factor analysis for the resolution of overlapping chromatographic peaks. Anal. Chem. 1987, 59: 527.
    [208] Maeder M, Zilian A. Evolving factor analysis, a new multivariate technique in chromatography. Chemom. Intell. Lab. Syst. 1988, 3: 205-213.
    [209] Keller H R, Massart D L. Peak purity control in liquid chromatography with photodiode-array detection by a fixed size moving window evolving factor analysis. Anal. Chim. Acta 1991, 246: 379-390.
    [210] Kvalheim O M, Liang Y Z. Heuristic evolving latent projections-resolving two-way multicomponent data. Part 1. selectivity, latent-projective graph, datascope, local rank and unique resolution. Anal. Chem. 1992,64: 936-945.
    [211] Liang Y Z, Kvalheim O M, Keller H R et al. Heuristic evolving latent projections-resolving two-way multicomponent data. Part 2. Detection and resolution of minor constituents. Anal. Chem. 1992, 64: 946-953.
    [212] 沈海林,梁逸曾,俞汝勤,黎先春,孙新熙.香港大气颗粒物中多环芳烃的HELP法解析, 中国科学(B辑),1998,27:556-563.
    [213] Gong Fan, Liang Y Z, Xu Q S, Chau F T. Gas chromatography-mass spectrometry and chemometric resolution applied to determination of essential oils in Cortex Cinamomi (肉桂), J. Chromatography A, 2001,909: 237-247.
    [214] 何荔,董运宇,李秀梅,姚彩兰,方志亚.C5馏份及羧化产物的毛细管气相色谱分析,色 谱,1999,17(3) :259-261.
    [215] 田树盛.二甲苯精馏塔塔底油及C9粗芳烃的毛细管气相色谱分析,色谱,1993,11(4) : 202-206.
    [216] Bock, R. D. Multilevel Analysis of Educational Data. San Diego: Academic Press. 1989.
    [217] Bryk, A. S., Stephen, W. R. Hierarchical Linear Models: Applications and Data Analysis Methods. Newbury Park, Calif.: Sage. 1992.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700