基于高通量计算及机器学习的新材料带隙预测
详细信息    查看全文 | 推荐本文 |
  • 英文篇名:New materials band gap prediction based on the high-throughput calculation and the machine learning
  • 作者:徐永林 ; 王香蒙 ; 李鑫 ; 席丽丽 ; 倪剑樾 ; 朱文浩 ; 张武 ; 杨炯
  • 英文作者:XU YongLin;WANG XiangMeng;LI Xin;XI LiLi;NI JianYue;ZHU WenHao;ZHANG Wu;YANG Jiong;School of Computer Engineering and Science, Shanghai University;Materials Genome Institute of Shanghai University;
  • 关键词:类金刚石结构 ; 带隙 ; 组分替换 ; 机器学习 ; 集成学习
  • 英文关键词:diamond-like structures;;bandgap;;component substitution approach;;machine learning;;ensemble learning
  • 中文刊名:JEXK
  • 英文刊名:Scientia Sinica(Technologica)
  • 机构:上海大学计算机工程与科学学院;上海大学材料基因组工程研究院;
  • 出版日期:2018-12-12 11:13
  • 出版单位:中国科学:技术科学
  • 年:2019
  • 期:v.49
  • 基金:国家重点研发计划(编号:2017YFB0701501);; 国家自然科学基金重大研究计划重点项目(编号:91630206)资助
  • 语种:中文;
  • 页:JEXK201901005
  • 页数:11
  • CN:01
  • ISSN:11-5844/TH
  • 分类号:48-58
摘要
在功能材料应用中,带隙往往起着重要的作用,如光电材料一般为宽带隙半导体,而热电材料为窄带隙半导体,因此对指定类别的材料体系带隙进行快速而准确的预测对于功能材料应用具有非常重要的科学意义.然而,通过基于第一性原理的高通量计算获取高精度带隙的方法耗时长,效率低,而实验上系统测量大量材料体系带隙也不现实,所以基于统计学的机器学习预测方法就成了一种有前景的可能性替代方案.本文设计了一种集成学习模型用于有效而准确地预测带隙值.在已计算过带隙值的热电材料类金刚石化合物的基础上,一方面利用单组元组分替换策略产生大批量相似化合物,并用查重技术过滤掉重复体系,得到356个相似材料体系.另一方面结合机器学习技术,构建高效的带隙预测模型,预测并验证了50个相似材料体系的带隙值.通过实验证明,该预测模型具有77.73%的准确率,且足够健壮稳定,可以广泛应用于需要进行大批量带隙预测的热电材料的研究情景中.
        The bandgap often plays an important role in functional materials applications. For example, optoelectronic materials are generally wide bandgap semiconductors, while thermoelectric materials are narrow bandgap semiconductor materials. Therefore, predicting the bandgap rapidly and accurately for a given class of materials structures has great scientific importance for the functional materials applications. However, considering that the method of obtaining high-precision band gaps based on first-principles high-throughput calculations is time consuming and inefficient, and it is also not realistic to systematically measure a large number of material system band gaps. Machine learning methods based the statistics may be a promising alternative. This paper designs an ensemble learning model for effectively and accurately predicting bandgap values. Based on the calculated band gap values of diamond-like structures in thermoelectric materials, on the one hand, single component substitution strategy was used to generate large quantities of similar compounds, and the repetitive structures was filtered out by using the structural repeatability examination technique, resulting in 356 unique material structures. On the other hand, in combination with machine learning techniques, an efficient band gap prediction model was constructed, and by which the band gap values of 50 similar material systems are predicted and verified. As is the result of the experiment, this prediction model has 77.73% accuracy. It is enough robustness and stability to be widely used in thermoelectric materials application scenarios which require large band gap prediction.
引文
1 Goldsmid H J. Introduction to Thermoelectricity. 2nd ed. Berlin Heidelberg:Springer-Verlag, 2005. 197–220
    2 Snyder G J, Toberer E S. Complex thermoelectric materials. Nat Mater, 2008, 7:105–114
    3 Bell L E. Cooling, heating, generating power, and recovering waste heat with thermoelectric systems. Science, 2008, 321:1457–1461
    4 Uher C. Semiconductors and Semimetals. San Diego:Academic Press Inc, 2001. 139–253
    5 Curtarolo S, Hart G L W, Nardelli M B, et al. The high-throughput highway to computational materials design. Nat Mater, 2013, 12:191–201
    6 Sharma V, Wang C, Lorenzini R G, et al. Rational design of all organic polymer dielectrics. Nat Commun, 2014, 5:4845
    7 Ceder G, Hautier G, Jain A, et al. Recharging lithium battery research with first-principles methods. MRS Bull, 2011, 36:185–191
    8 Curtarolo S, Setyawan W, Wang S, et al. AFLOWLIB.ORG:A distributed materials properties repository from high-throughput ab initio calculations. Comput Mater Sci, 2012, 58:227–235
    9 Landis D D, Hummelshoj J S, Nestorov S, et al. The computational materials repository. Comput Sci Eng, 2012, 14:51–57
    10 Service R F. Materials scientists look to a data-intensive future. Science, 2012, 335:1434–1435
    11 Rupp M, Tkatchenko A, Müller K R, et al. Fast and accurate modeling of molecular atomization energies with machine learning. Phys Rev Lett,2011, 108:058301
    12 Huan T D, Mannodi-Kanakkithodi A, Ramprasad R. Accelerated materials property predictions and design using motif-based fingerprints. Phys Rev B, 2015, 92:014106
    13 Schütt K T, Glawe H, Brockherde F, et al. How to represent crystal structures for machine learning:Towards fast prediction of electronic properties. Phys Rev B, 2014, 89:205118
    14 Meredig B, Agrawal A, Kirklin S, et al. Combinatorial screening for new materials in unconstrained composition space with machine learning.Phys Rev B, 2014, 89:094104
    15 Faber F, Lindmaa A, von Lilienfeld O A, et al. Crystal structure representations for machine learning models of formation energies. Int J Quantum Chem, 2015, 115:1094–1101
    16 Faber F A, Lindmaa A, von Lilienfeld O A, et al. Machine learning energies of 2 million elpasolite(ABC2D6)crystals. Phys Rev Lett, 2016, 117:135502
    17 Seko A, Maekawa T, Tsuda K, et al. Machine learning with systematic density-functional theory calculations:Application to melting temperatures of single-and binary-component solids. Phys Rev B, 2014, 89:054303
    18 Kishida I. The graph-theoretic minimum energy path problem for ionic conduction. AIP Adv, 2015, 5:107107
    19 Toyoura K, Hirano D, Seko A, et al. Machine-learning-based selective sampling procedure for identifying the low-energy region in a potential energy surface:A case study on proton conduction in oxides. Phys Rev B, 2016, 93:054112
    20 Zunger A. Systematization of the stable crystal structure of all AB-type binary compounds:A pseudopotential orbital-radii approach. Phys Rev B,1980, 22:5839–5872
    21 Ghiringhelli L M, Vybiral J, Levchenko S V, et al. Big data of materials science:Critical role of the descriptor. Phys Rev Lett, 2015, 114:105503
    22 Pilania G, Wang C, Jiang X, et al. Accelerating materials property predictions using machine learning. Sci Rep, 2013, 3:2810
    23 Pozun Z D, Hansen K, Sheppard D, et al. Optimizing transition states via kernel-based machine learning. J Chem Phys, 2012, 136:174101
    24 Behler J. Atom-centered symmetry functions for constructing high-dimensional neural network potentials. J Chem Phys, 2011, 134:074106
    25 Botu V, Ramprasad R. Adaptive machine learning framework to accelerate ab initio molecular dynamics. Int J Quantum Chem, 2015, 115:1074–1083
    26 Snyder J C, Rupp M, Hansen K, et al. Finding density functionals with machine learning. Phys Rev Lett, 2012, 108:253002
    27 Dey P, Bible J, Datta S, et al. Informatics-aided bandgap engineering for solar materials. Comput Mater Sci, 2014, 83:185–195
    28 Lee J, Seko A, Shitara K, et al. Prediction model of band gap for inorganic compounds by combination of density functional theory calculations and machine learning techniques. Phys Rev B, 2016, 93:115104
    29 Pilania G, Mannodi-Kanakkithodi A, Uberuaga B P, et al. Machine learning bandgaps of double perovskites. Sci Rep, 2016, 6:19375
    30 Setyawan W, Gaume R M, Lam S, et al. High-throughput combinatorial database of electronic band structures for inorganic scintillator materials.ACS Comb Sci, 2011, 13:382–390
    31 Gu T, Lu W, Bao X, et al. Using support vector regression for the prediction of the band gap and melting point of binary and ternary compound semiconductors. Solid State Sci, 2016, 8:129–136
    32 Shi S Q, Jian G, Yue L, et al. Multi-scale computation methods:Their applications in lithium-ion battery research and development. Chin Phys B,2015, 25:018212
    33 Liu Y, Zhao T, Ju W, et al. Materials discovery and design using machine learning. J Materiomics, 2017, 3:159–177
    34 Sklyarchuk V M, Plevachuk Y O. Electronic properties of liquid Tl2Te, Tl2Se, Ag2Te, Cu2Te, and Cu2Se alloys. Semiconductors, 2002, 36:1123–1127
    35 Luo Y, Yang J, Jiang Q, et al. Progressive regulation of electrical and thermal transport properties to high-performance CuInTe2thermoelectric materials. Adv Energy Mater, 2016, 6:1600007
    36 Kresse G, Furthmüller J. Efficient iterative schemes for ab initio total-energy calculations using a plane-wave basis set. Phys Rev B, 1996, 54:11169–11186
    37 Bl?chl P E. Projector augmented-wave method. Phys Rev B, 1994, 50:17953–17979
    38 Tibshirani R. Regression shrinkage and selection via the lasso:A retrospective. J R Stat Soc Ser B Stat Methodol, 2011, 73:273–282
    39 Smola A J, Sch?lkopf B. A tutorial on support vector regression. Stat Comput, 2004, 14:199–222
    40 Friedman J H. Greedy function approximation:A gradient boosting machine. Ann Stat, 2001, 29:1189–1232
    41 Zhang C, Ma Y. Ensemble Machine Learning Methods and Applications. Springer, 2012. 35–87
    42 Perdew J P, Burke K, Ernzerhof M. Generalized gradient approximation made simple. Phys Rev Lett, 1996, 77:3865–3868
    43 Sun J, Ruzsinszky A, Perdew J P. Strongly constrained and appropriately normed semilocal density functional. Phys Rev Lett, 2015, 115:036402
    44 Moitra A. Introduction to Electronic Structure Calculations using VASP. 2010. https://icme.hpc.msstate.edu/mediawiki/images/8/8d/IntroDeckElectStrucCalcJuly28_2010.pdf
    45 Hautier G, Fischer C, Ehrlacher V, et al. Data mined ionic substitutions for the discovery of new compounds. Inorg Chem, 2010, 50:656–663
    46 Johrendt D, P?ttgen R. Pnictide oxides:A new class of high-TC superconductors. Angew Chem Int Edit, 2008, 47:4782–4784
    47 Kohavi R. A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection. IJCAI, 1995, 14:1137–1145
    48 Hauke J, Kossowski T. Comparison of values of pearson’s and spearman’s correlation coefficients on the same sets of data. Quaestiones Geographicae, 2011, 30:87–93
    49 Staelin C. Parameter selection for support vector machines. Hewlett-Packard Company, Tech. Rep. HPL-2002-354R1. 2003. http://www.hpl.hp.com/techreports/2002/HPL-2002-354R1.html
    50 Wolpert D H. Stacked generalization. Neural Networks, 1992, 5:241–259
    1)Yang J, Xi L L, Pan S S, et al. Discovery of high performance thermoelectric chalcogenides through reliable high throughput material screening. J Am Chem Soc, 2018, 140:10785–10793
    2 )Materials Informatics Platforn(MIP). http://mip.shu.edu.cn.
    3 )Phonopy. https://atztogo.github.io/phonopy.2N2=1ii=1i替换策略以及机器学习方法的新材料带隙预测模型,并针对一类典型热电材料-类金刚石化合物组分替换公式(4)为斯皮尔曼系数公式.其中, xi为训练集中产物的带隙值开展了预测实验.通过特征选择、网格第i个样本的特征x的值; x是在所有样本上的该特征平调优、模型组合等一系列手段,结合Lasso、SVR和均值; yi为第i个样本的特征y的值; y是其平均值,GBDT三种不同模型的优势,获得了具有77.73%预测i=1, 2,..., N, N为训练集的样本总数.精度的集成学习带隙预测模型Ensemble,并发现了类此外,在模型训练过程中, Lasso和GBDT两种模型金刚石化合物的电负性差“ED”与化合物中阳离子平自带特征重要性评分功能.在图6中,本文绘制了基于均原子质量“CAAM”两个特征对预测目标变量带隙值斯皮尔曼系数、Lasso与GBDT模型特征打分功能进具有重要影响.这种集成学习模型具有强大的学习能行了特征重要性评价.由图6可知,电负性差“ED”与化力与稳定的预测性能,可以应用到其它类型材料体系合物中阳离子平均质量“CAAM”两个特征在不同的评的带隙以及其它材料属性的预测过程中,加速材料性分算法下均得到了相对较高的分数,体现为对训练模能的设计与优化过程,对新型功能材料的快速筛选与型的性能好坏具有较高的影响力.高性能预测具有重要的科学意义.补充材料本文的补充材料见网络版techcn.scichina.com.补充材料为作者提供的原始数据,作者对其学术质量和内容负责.
    4)https://newonlinecourses.science.psu.edu/stat501/node/343/.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700