开放骨架磷酸铝合成反应预测研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
无机微孔材料的应用与材料本身的多孔结构有着密切的联系,例如孔道的维数、形状和面积的不同在应用上也具有巨大的差异。无机微孔晶体由于其独特的规则孔道结构而被广泛地应用于催化、吸附、分离和离子交换等领域,因而具有新颖结构的微孔晶体的设计、合成以及新合成路线的开发一直备受关注。其中,开放骨架结构的金属磷酸盐化合物由于其结构的多样性和潜在的应用价值,国内外很多学者已经对其开展了广泛而深入的研究。无机微孔晶体化合物的合成十分复杂,材料结晶受诸多因素的影响,例如原材料、凝胶组成、PH值、模板剂、溶剂,结晶温度和时间等。对这类材料合成的研究与分析,主要困难是由于它们的合成过程难以控制、结晶机理复杂难以理解和建模。在过去几年里,科研者试图建立新的合成方法的预测模型,尤其将统计学方法应用到目标材料的定向设计中,期望得到性能较好的特定结构预测模型,用来预测新型合成材料。尽管一些统计方法在化学材料分析中已经得到了广泛的应用并取得了预期的研究成果,但是对开放骨架磷酸铝合成实验中的分析和预测的研究相对较少。
     鉴于开放骨架磷酸铝丰富的化学结构,本文采用基于统计的机器学习理论和方法对磷酸铝分子筛进行了大量的结构分析与预测,主要应用在:挖掘合成参数对合成产物某一特定结构的影响程度,为合成实验提供特定结构形成机理的解释;建立合成参数对产物孔道环数和产物类型的预测模型,提高定向合成实验的成功率。具体研究内容分为如下两部分:
     一、利用多种基于统计的机器学习方法对数据库的合成参数和产物结构进行了大量的分析与预测,具体如下:
     1.鉴于数据库中的合成参数之间存在严重的相关性,而偏最小二乘能够很好的解决变量间的多重共线性问题,本文采用偏最小二乘法分析合成参数对预测产物特定结构的影响程度,并采用主成分分析方法提取产物某些特定结构的综合信息,建立合成参数对产物特定结构的回归模型。
     2.在使用相同模板剂的合成反应中,采用BP神经网络方法分析凝胶组成及其成分组合对预测产物类型的影响程度。
     3.由于支持向量机能够较好的解决非线性、高维数、局部极小点等问题,本文采用支持向量机方法预测产物的孔道环数和产物类型,挖掘对生成具有特定孔道环数和特定结构类型的化学材料的模板剂参数,并采用交叉验证方法进一步提高分类器的可靠性。
     4.鉴于多元线性回归对变量之间不可以存在严重相关的限制,采用岭回归方法建立合成参数对产物类型的预测模型,并详细研究了岭参数和阈值的选取对预测性能影响。
     5.本文还采用偏最小二乘与Logistic回归结合的统计方法(PLS-LR)进行合成参数对产物类型的预测。首先采用偏最小二乘方法去除合成参数之间相关性,得到新的低维变量;然后采用Logistic回归方法在低维变量上预测产物的类型;最后通过对预测结果的影响确定偏最小二乘提取的成分个数,建立合成参数对产物类型的预测模型。
     大量实验与分析说明了本文采用的基于统计的机器学习方法能够挖掘出合成参数对生成产物特定结构的影响程度,并且建立了性能良好的合成参数对产物特定结构和特定类型的预测模型。
     二、针对磷酸铝合成数据库存在的类不平衡问题,提出了新的采样方法。
     数据的类不平衡问题会降低分类器的分类性能,针对预测实验中的数据存在类不平衡问题(如两类样本的比例为1:3),基于无监督的模糊C均值方法,本文提出了两种有指导的上采样方法:FCMP1,FCMP2;两种有指导的混合采样方法:FCMP1+Tomek和FCMP2+Tomek。这些方法不仅考虑了类间不平衡问题,而且考虑了类内不平衡问题,克服了现有方法的盲目采样的缺点。并且,在混合采样方法中同时去除了两类的噪音样本或边缘样本,使两类样本更具有可分性。实验结果表明,在采样后的数据集上的预测结果要明显优于原始数据的预测结果。与一些现有的采样方法相比,本文提出的采样方法展示了更好的预测性能。
     本文采用基于统计的机器学习方法,建立了磷酸铝合成反应数据库中合成参数对产物特定结构的一系列预测模型;为了有效的解决类不平衡问题,提出了新的采样方法来提高预测性能。本文的研究将使分子筛骨架的定向设计变得更加直接有效,减少实验成本开销,尤其对于根据功能需要定向设计具有特殊结构的分子筛骨架有重要指导意义。
The applications of different microporous inorganic materials have direct and close relations to their porous structures. For example, the differences of dimensionality, shape and the volume of pore will result in huge difference in applications. Microporous inorganic crystals have been widely used in the fields of catalysis, adsorption separation and ion-exchange because of their unique and regular pore structures. Therefore, the design and synthesis of microporous crystals with novel structures, and the development of new synthesis routes are always being concerned. Among them, owing to their structural diversities and potential applications, open-framework metal phosphate compounds have been extensively and deeply studied by many domestic and abroad scholars. The synthesis of microporous inorganic crystals is very complex and the crystalline of materials is affected by many factors such as the source materials, the gel composition, the PH value, the template, the solvent, the crystallization temperature and time etc. For the synthesis research and analysis of these materials, it is difficult to control the process of synthesis, understand and model their complex crystallization kinetics. In the past years, researchers have tried to establish prediction models of new synthesis methods. Specially, they applied some statistical methods to the rational designs of target materials in order to obtain good prediction model for specific structure, which were used for the synthesis of new materials. Although these statistical methods have been widely employed and obtained good predictive results in chemical material analysis, the study on analysis and prediction of open-framework aluminophosphates (AlPOs) is not enough.
     In view of the rich chemical structure of open-framework AlPOs, the theories and methods of machine learning based on statistics are employed to analyze and predict the structures of AlPOs molecular sieves in this thesis. The methods are mainly applied to mine the influence of synthesis parameters to predict the some resultant structures and provide rational interpretation of the formation specific structure, establish the prediction model of synthesis parameters to resultant pore ring and type for enhancing the rate of success of rational synthesis experiments. The detailed study is divided into the following two parts:
     Part I: A series of analysis and prediction works are done between the synthesis parameters and the resultant structures using machine learning methods based on statistics on the AlPOs database described as follows.
     1. On account of the severe correlation among the synthesis parameters, partial least squares (PLS) which can deal with the problem of severe correlation among variables is employed to analyze the influence of synthesis parameters to predict the resultant specific structures. Furthermore, principal component analysis (PCA) is used to extract the synthetic information of some resultant specific structures to establish the regression model of synthesis parameters to resultant specific structures.
     2. Under the condition of using the same template for synthesis, back propagation neural networks (BPNNs) is adopted to analyze the influence of the gel compositions and their combinations to predict the resultant type.
     3. Since the support vector machine (SVM) can solve the problems of nonlinear, high dimensionalities and local minimum points, it is adopted to predict the resultant pore ring and type. Also, the influence of template attributes for predicting the material with specific pore ring and type is mined. Moreover, the cross validation is adopted to further enhance the reliability of classifier.
     4. To avoid the limitation that variables can not be serious correlation in the multiple linear regressions (MLR), the ridge regression (RR) is used to establish the prediction model of synthesis parameters to resultant type. In addition, the effect on prediction performance for the selection of ridge parameter and threshold is studied in detail.
     5. A statistical method combining PLS and logistic regression (LR), named as PLS-LR, is also adopted in this thesis to accomplish the prediction of synthesis parameters to resultant type. First, the correlation among synthesis parameters is removed using PLS to obtain new low dimensional variables. Then, LR is used to predict the resultant type based on low dimensional variables. Finally, the number of components in PLS is determined through analyzing the effect on prediction results with different number of components.
     Extensive experiments and analysis domonstrate that the machine learning methods based on statistics can mine the influence of synthesis parameters to the specific resultant structures and establish good prediction model of synthesis parameters to resultant specific structure and type.
     Part II: Aiming to solve the problem of class imbalance existing in the AlPOs database, novel resampling methods are proposed.
     The class imbalance will degrade the classification performance of classifier. Owing to the existence of class imbalance in the predictive experiments (such as the ratio of two classes is 1: 3), this thesis proposes two guided over-sampling methods on the basis of on fuzzy c-means (FCM), named as FCMP1 and FCMP2, and two guided combined-sampling methods, named as FCMP1+Tomek and FCMP2+Tomek. These methods not only consider the inter-class imbalance but also the intra-class imbalance to overcome shortcoming of blind resampling methods. Moreover, the combined-sampling methods remove the noisy or borderline samples for both classes simultaneously, which results in the two classes more discriminative. The experimental results demonstrate the predictive results on sampled dataset are better than the results on original dataset. Furthermore, compared with some existing resampling methods, our proposed resampling methods exhibit much better predictive results.
     In this thesis, machine learning methods based on statistics are employed to establish a series of predictive models of synthesis parameters to resultant specific structure on AlPOs database. To solve the problem of data class imbalance effectively, novel resampling methods are proposed to improve the predictive performance. The research of this thesis will make the rational design of molecular sieves framework more straightforward and efficient, and reduce the experimental cost. In particular, this work will provide important guiding significance for rational designing the molecular sieves framework with specific structures.
引文
[1]徐如人,庞文琴等著.分子筛与多孔材料化学[M].北京:科学出版社, 2004, 1-23.
    [2]Davis M E. Ordered porous Materials for emerging applications[J]. Nature, 2002 417: 813-821.
    [3]Cronstedt A F, Vetenskaps K. Natural Zeolite and Minerals[J]. Acad Handle Stockholm, 1756, 17: 120.
    [4]杨磊.具有开放护甲结构磷酸稼(亚磷酸稼)微孔晶体的水热合成与研究[D]: [博士学位论文].吉林:吉林大学化学系, 2005.
    [5]Milton R M. Aassignor to Union Carbide Corporation, a corporation of New York, Molecular sieve adsorbents US Patent. 1959, 882, 243.
    [6]Wilson S T, Lok B M, Messina C A, Cannan T R, Flanigen E M. Aluminophosphate molecular sieves: A new class of microporous crystalline inorganic solids[J]. J. Am. Chem. Soc. 1982, 104(4): 1146-1147.
    [7]Dessau R M, Schlenker J L, Higgins J B. Framework topology of AIPO4 -8: the first 14-ring molecular sieve, Zeolites[J]. Zeolites, 1990, 10(6): 522-524.
    [8]Davis M E, Saldarriaga C, Montes C, Garces J, Crowder C. A molecular sieve with eighteen-membered rings[J]. Nature, 1988, 331: 698.
    [9]Huo Q, Xu R, Li S, Ma Z, et al. Synthesis and characterization of a novel extra large ring of aluminophosphate JDF-20[J]. J. Chem. Soc. Chem. Commun.1992, 875-876.
    [10]Bennett J M, Dyttych W J, Pluth J J. Structural features of aluminophosphate materials with Al/P = 1[J]. Zeolites, 1986, 6(9): 349-360.
    [11]Yu J H, Xu R R. Rich structure chemistry in the Aluminophosphate Family[J]. Accounts of Chemical Research. 2003 36: 481-490.
    [12]Marcilly C. Evolution of Refining and Petrochemicals, What in the place of Zeolites. Stud. Surf. Sci. Catal., 135, Galarneau A, Di Renzo F, Fajula F, Vedrine J. (Eds) Elsevier 2001: 37-60.
    [13]Flanigen E M. Molecular Sieve Zeolite Technology-The First twenty-five years. Proc. of the fifth Intl. Conf. on Zeolites. Heyden, Rees L.V.C. (Ed.) 1980: 760-780.
    [14]Hutchings G J, Willock D J, Lewis D W, Catlow C R A, Thomas J M. Designing templates for the synthesis of microporous solids using de novo molecular design methods[J]. Journal of Molecular Catalysis A: Chemical, 1997, 119(1): 415-424.
    [15]Yu J, Xu R. Toward the rational design and synthesis of inorganic microporous and related materials[J]. Studies in Surface Science and Catalysis, 2004, 154: 1-13.
    [16]Li J, Li L, Lang J, Chen P, Yu J, Xu R. Template-Designed Syntheses of Open-Framework Zinc Phosphites with Extra-Large 24-Ring Channels[J]. Crystal Growth and Design, 2008, 8(7): 2318-2323.
    [17]Baumes L A, Moliner M, Corma A. Prediction of ITQ-21 Zeolite Phase Crystallinity: Parametric Versus Non-parametric Strategies[J]. QSAR Combinatorial science, 2007, 26(2): 255-272.
    [18]刘晓东,徐翊华,于吉红,李乙,曾伟,陈超,李激扬,庞文琴,徐如人,徐鹰.数据挖掘辅助定向合成(I)-具有特定孔道结构的微孔磷酸铝[J].高等学校化学学报, 2003, 24(6): 949-952.
    [19]Corma A, Serra J M, Argente E, Valero S, Botti V. Application of Artificial Neural Networks to Combinatorial Catalysis: Modeling and Predicting Odhe Catalysts[J]. ChemPhysChem, 2002, 3: 939-945.
    [20]Baumes L, Farrusseng D, Lengliz M, Mirodatos C. Using Artificial Neural Networks to boost high-throughput discovery in heterogeneous catalysis[J]. QSAR Combinatorial science, 2004, 23 (9): 767-778.
    [21]Omata K, Watanabe Y, Hashimoto M, Umegaki T, Yamada M. Industrial & Engineering Chemistryresearch, 2004, 43: 3282-3288.
    [22]Umegaki T, Watanabe N, Nukui N, Omata K, Yamada M. Optimization of Catalyst for Methanol Synthesis by a Combinatorial Approach Using a Parallel Activity Test and Genetic Algorithm Assisted by a Neural Network[J]. Energy Fuels, 2003, 17(4): 850-856.
    [23]Burbidge R, Trotter M, Buxton B, Holden S. Drug design by machine learning: support vector machines for phamaceutical data analysis[J]. Computer Chemistry 2001, 26: 5-14.
    [24]Zhang S W, Pan Q, Zhang H C, Zhang Y L, Wang H Y. Classification of protein quaternary structure with support vector machine[J]. Bioinformatics, 2003, 19: 2390-2396.
    [25]Jemwa G T, Aldrich C. Improving Process Operations Using Support Vector Machines and Decision Trees[J]. AIChE Journal, 2005, 51: 526-543.
    [26]Xue C X, Zhang R S, Liu H X, Liu M C, Hu Z D, Fan B T. Support vector machines-based quantitative structure-property relationship for the prediction of heat capacity[J]. Journal of Chemical Information and Computer Science, 2004, 44(4): 1267-1274.
    [27]Liu H X, Zhang R S, Yao X J, Liu M C, Hu Z D, Fan B T. Prediction of the isoelectric point of an amino acid based on GA-PLS and SVMs[J]. Journal of Chemical Information and Computer Science, 2004, 44: 161-167.
    [28]Liu H X, Zhang R S, Yao X J, Liu M C, Hu Z D, Fan B T. Prediction of electrophoretic mobility of substituted aromatic acids in different aqueous-alcoholic solvents by capillary zone electrophoresis based on support vector machine[J]. Analytica Chimica Acta, 2004, 525: 31-41.
    [29]Baumes L A, Serra J M, Serna P, Corma A. Support Vector Machines for Predictive Modeling in Heterogeneous Catalysis: A Comprehensive Introduction and Overfitting, Investigation Based on Two Real Applications[J]. Journal of Combination Chemistry, 2006, 8: 583-596.
    [30]Serra J M, Baumes L A, Moliner M, Serna P, Corma A.Zeolite Synthesis Modelling with Support Vector Machines: A Combinatorial Approach[J]. Combinatorial Chemistry & High Throughput Screening, 2007, 10: 13-24.
    [31]徐文国,李宝宗,裘式纶,庞文琴,徐如人.磷酸铝分子筛骨架晶格能与骨架拓扑结构的关系[J].高等学校化学学报, 1998, 19(8): 1292-1295.
    [32]刘晓东,徐翊华,于吉红,李激扬,庞文琴,徐如人.微孔晶体合成反应数据库的数据挖掘研究[J].复旦学报(自然科学版), 2003, 42(6): 869-874.
    [33]Chen J S, Pang W Q, Xu R R. Topics in Catalysis, 1999, 9, 93.
    [34]颜岩,李激扬,齐妙,张晓,于吉红,徐如人.开放骨架磷酸铝合成反应数据库的建立与应用[J].中国科学B辑:化学, 2009, 39(11): 1308-1313.
    [35]Baerlocher C, McCusker L B. Database of Zeolite Structures. Available from: (accessed April 2009).
    [36]王梅.层空和微孔磷酸铝化合物的合成与表征[D]: [博士学位论文].吉林:吉林大学化学学院.
    [37]周丹.阴离子骨架磷酸铝的合成与固体核磁共振研究[D]: [博士学位论文].吉林:吉林大学化学学院.
    [38]李激扬,于吉红,徐如人.微孔化合物生成中的结构导向与模板作用[J].无机化学学报, 2004, 20(1): 1-20.
    [39]Jolliffe I T. Principal Component Analysis[M]. Springer-Verlag, New York, 1986.
    [40]Ribaric S, Fratric I. Biometric Identification System Based on Eigenpalm and Eigenfinger Features[J]. IEEE transactions on pattern analysis and machine intelligence, 2005, 27(11): 1698-1709.
    [41]芮挺,王金岩,沈春林,丁健.基于PCA的图像小波去噪方法[J].小型微型计算机系统, 2006, 27(1): 158-161.
    [42]Hoerl A E, Kennard R W. Ridge regression: biased estimation for nonorthogonal problems[J]. Techometrics, 1970, 12: 55-68.
    [43]Xue H, Zhu Y L, Chen S C. Local ridge regression for face recognition[J]. Neurocomputing, 2009, 72: 1342-1346.
    [44]颜学峰.嵌入岭回归的BP算法及其在软测量中的应用[J].化工自动化及仪表, 2007, 34 (3): 11-15.
    [45]张金槐.线性模型参数估计及其改进[M].长沙:国防科技大学出版社, 1999.
    [46]景继,包腾飞,谷艳昌,朱赵辉.监测数据共线性问题的岭回归法分析[J].水电自动化与大坝监测, 2007, 31(3): 59.
    [47]何秀丽.多元线性模型与岭回归分析[D]: [硕士学位论文].武汉:华中科技大学数学与统计学院,2005.
    [48]汪明瑾,王静龙.岭回归中确定K值的一种方法[J].应用概率统计, 2001, 17(1): 7-13.
    [49]盛承懋,李慧芬.应用回归分析[M].上海:上海科学技术出版社, 1988.
    [50]周纪芗.回归分析[M].上海:华东师范大学出版社, 1990.
    [51]Wold, H. Partial least squares, In S. Kotz and N. L. Johnson, editors. Encyclopedia of the statistical Sciences[M]. 1985, 6: 581-591.
    [52]Nguyen D, Rocke, D. Tumor classification by partial least squares using microarray gene expression data[J]. Bioinformatics, 2002, 18: 39-50.
    [53]Fort G, Lacroix S L, Classification using partial least squares with penalized logistic regression[J]. Bioinformatics, 2005, 21: 1104-1111.
    [54]Jong S D. SIMPLS. An alternative approach to partial least squares regression[J]. Chemometrics and Intelligent Laboratory Systems, 1993, 18: 251-263.
    [55]Javier G, Daniel P, Rosario R. A robust partial least squares regression method with applications[J]. Journal of Chemometrics, 2009, 23: 78-90.
    [56]Wang H W, Wu Z B, Meng J. Partial Least-Squares Regression-Linear and Nonlinear Methods, National Defense Industry Press.
    [57]Hosmer D W, Lemeshow S. Applied Logistic Regression.New York[M]. NY: Wiley & Sons, 1989.
    [58]Ohlson J A. Financial Ratios and the Probabilistic Prediction of Bankruptcy[J]. Journal of Accounting Research, 1980, 18(1): 109-131.
    [59]袁惠,姚应水,金岳龙.急性白血病危险因素的多元logistic回归分析[J].中国公共卫生, 2004, 20 (11): 1328-1329.
    [60]唐有瑜.财务危机预警模型在信贷风险管理中的应用.上海金融, 2002, 2: 36-38.
    [61]Mingoti S A, Lima J O. Comparing SOM neural network with Fuzzy c-means, K-means and traditional hierarchical clustering algorithms[J]. European Journal of Operational Research, 2006, 174: 1742-1759.
    [62]Kim N, Kehtarnavaz N, Yeary M B, Thornton S. DSP-Based Hierarchical Neural Network Modulation Signal Classification[J]. IEEE Transactions on Neural Networks, 2003, 14(5): 1065-1071.
    [63]Hu L H, Wang X J, Wong L H, Chen G H. Combined first-principles calculation and neural-network correction approach for heat of formation[J]. Journal of Chemical Physics, 2003, 119: 11501-11507.
    [64]Wang X J, Wong L H, Hu L H, Chan C Y, Su Z M, Chen G H. Improving the Accuracy of Density-Functional Theory Calculation: The Statistical Correction Approach[J]. Journal of Chemical Physics, 2004, 108(85): 8514-8525.
    [65]Vapnik V. The Nature of Statistical Learning Theory[M]. Springer, New York, 1995.
    [66]Temko A, Nadeu C. Classification of acoustic events using SVM-based clustering schemes[J]. Pattern Recognition, 2006, 39: 682-694.
    [67]Wang X Y, Yang H Y, Cui C Y. An SVM-based robust digital image watermarking against desynchronization attacks[J]. Signal Processing, 2008, 88: 2193-2205.
    [68]Lessmann S, Sung M C, Johnson E V. Identifying winners of competitive events: A SVM-based classification model for horserace prediction[J]. European Journal of Operational Research, 2009, 196:569-577.
    [69]Feng L, Zhang X Y, Zhang H X, et al. Prediction of standard Gibbs energies of the transfer of peptide anions from aqueous solution to nitrobenzene based on support vector machine and the heuristic method[J]. Journal of Computer-Aided Molecular Design, 2006, 20(1): 1-11.
    [70]Lind P, Maltseva T. Support vector machines for the estimation of aqueous solubility[J]. Journal of Chemical Information and Computer Sciences, 2003, 43(6): 1855-1859.
    [71]Burbidge R, Trotter M, Buxton B, et al.Drug design by machine learning: support vector machines for pharmaceutical data analysis[C]. In: Symposium on Artificial Intelligence in Bioinformatic, 2000. Birmingham, England.
    [72]Kramer S, Frank E, Helma C. Fragment generation and support vector machines for inducing SARs[J]. Sar and Qsar in Environmental Research, 2002, 13(5): 509-523.
    [73]Luan F, Zhang R S, Zhao C Y, et al. Classification of the carcinogenicity of N-nitroso compounds based on support vector machines and linear discriminant analysis[J]. Chemical Research in Toxicology, 2005, 18(2): 198-203.
    [74]唐发明.基于统计学习理论的支持向量机算法研究[D]: [博士学位论文],武汉:华中科技大学控制科学与工程系, 2005.
    [75]Bezdek J C. Pattern recognition with fuzzy objective function algorithm[M]. New York: Plenum Press, 1981.
    [76]况夯,罗军.基于遗传FCM算法的文本聚类[J].计算机应用, 2009, 29(2): 558-564.
    [77]Kobashi S, Kitamura T Y, Otsuki M, Hata Y, Naritomi H, Yanagida T. Time series analysis in near-infrared spectroscopy (NIRS) aided by fuzzy C-means (FCM) and wavelet transforms[J]. NeuroImage, 2010, 13(6): 175-175.
    [78]Vrienda S P, Gaansa P F M, Middelburga J, Nijs A. The application of fuzzy c-means cluster analysis and non-linear mapping to geochemical datasets: examples from Portugal[J]. Applied Geochemistry, 2009, 3(2): 213-224.
    [79]Chawla N V, Bowyer K W, Hall L O, Kegelmeyer W P. SMOTE: Synthetic Minority Over-sampling Technique[J]. Journal of Artificial Intelligence Research, 2002, 16: 321-357.
    [80]Lee S S. Noisy replication in skewed binary classification[J]. Computational Statistics and Data Analysis, 2000, 34: 165-191.
    [81]Tomek I. Two Modifications of CNN[J]. IEEE Transactions on Systems Man and Communications, 1976, 6: 769-772.
    [82]Kubat M, Matwin S. Addressing the curse of imbalanced training sets: one-sided selection[C]. In: Proc. 14th International Conference on Machine Learning, Morgan Kaufmann, 1997: 179-186.
    [83]Hart P E. The Condensed Nearest Neighbor Rule[J]. IEEE Transactions on Information Theory, 1968, 14: 515-516.
    [84]Laurikkala J. Improving identification of difficult small classes by balancing class distribution. Technical Report Technical Report A-2001-2, University of Tampere, 2001.
    [85]Wilson D L. Asymptotic properties of nearest neighbor rules using edited data[J]. IEEE Transactions on Systems, Man and Cybernetics, 1972, 3: 408-421.
    [86]Estabrooks A, Jo T, Japkowicz N. A multiple resampling method for learning from imbalanced data sets[J]. Computational Intelligence, 2004, 20: 18-36.
    [87]Gustavo E A P A B, Ronaldo C P, Maria, C M. A study of the behavior of several methods for balancing machine learning training data[J]. Sigkdd Explorations, 2004, 6: 20-29.
    [88]Nickerson A, Japkowicz N, Milios E. Using unsupervised learning to guide resampling in imbalanced data sets. In: Proceedings of the Eighth International Workshop on AI and Statistics, 2001, 261-265.
    [89]Jo T, Japkowicz N. Class imbalances versus small Disjuncts[J]. Sigkdd Explorations, 2004, 6: 40-49.
    [90]Waegeman W, Baets B D, Boullart L. ROC analysis in ordinal regression learning[J]. Pattern Recognition Letters, 2008, 29(1): 1-9.
    [91]Elizabeth A. Freeman, Gretchen Moisen, PresenceAbsence: An R Package for Presence Absence Analysis[J]. Journal of Statistical Software, 2008, 23(11): 1-31.
    [92]杨朝辉,陈鹰.基于ROC分析的Canny算法在景象匹配中的应用[J].计算机应用, 2009, 29(4): 1193-1196.
    [93]张占卿,曹婕,陆伟,史连国. ROC曲线法评价简易无创模型预测乙型肝炎相关肝硬化[J].胃肠病学和肝病学杂志, 2009: 18(1): 41-44.
    [94]John S T, Nello C. Kernel Methods for Pattern Analysis[M]. China Machine Press, 2005, 215.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700