文摘
The availability of large amounts of data generated by high-throughput computing and experimentation has generated interest in the application of machine learning techniques to materials science. Machine learning of materials behavior requires the use of feature vectors that capture compositional or structural information influence a target property. We present methods for assessing the similarity of compositions,substructures,and crystal structures. Similarity measures are important for the classification and clustering of data points,allowing for the organization of data and the prediction of materials properties. The similarity functions between ions,compositions,substructures and crystal structure are based upon a data-mined probability with which two ions will substitute for each other within the same structure prototype. The composition similarity is validated via the prediction of crystal structure prototypes for oxides from the Inorganic Crystal Structure Database. It performs particularly well on the quaternary oxides,predicting the correct prototype within 5 guesses 90% of the time. The sustructural similarity is validated via the prediction of Li insertion sites in the oxides;it finds all of the Li sites with less than 8 incorrect guesses 90% of the time.