摘要
针对非平衡产品制造数据关键质量特性(critical to quality characteristics,CTQs)识别,提出基于NSGA-Ⅱ的特征选择算法.首先,在分类错误率和特征子集大小基础上,针对数据非平衡性,引入第Ⅱ类错误率度量质量特性子集的重要性.接着,应用多目标进化算法NSGA-Ⅱ最小化以上三个度量标准,得到非支配解集.最后,引入理想点法从非支配解集中选择最佳调和解,得到CTQ集.算例结果表明,所提算法能够得到较高分类精度,同时有效降低第Ⅱ类错误率与CTQ集大小,说明了算法的有效性.
To select critical to quality characteristics(CTQs) for imbalanced production data,a feature selection algorithm based on NSGA-Ⅱ is proposed.Firstly,to solve the problem of data imbalance,type Ⅱerror is introduced to measure the importance of quality characteristics subset in addition to classification error and feature subset size.Secondly,NSGA-Ⅱ,a multi-objective evolutionary algorithm,is applied to minimize the three metrics above,and a non-dominated solution set is acquired.Finally,the ideal point method is adopted to obtain the best compromise solution(CTQ set) from the non-dominated solution set.Experimental results illustrate that the proposed algorithm can obtain high classification accuracy,and in the meantime,effectively reduce type Ⅱ error and CTQ set size,which shows the efficiency of the proposed algorithm.
引文
[1]Lee D J,Thornton A C.The identification and use of key characteristics in the product development process[C]//1996 ASME Design Engineering Technical Conference,1996.
[2]闫伟,何桢,李岸达.基于CEM-IG算法的复杂产品关键质量特性识别[J].系统工程理论与实践,2014,34(5):1230-1236.Yan W,He Z,Li A D.Identification of critical-to-quality characteristics for complex products using CEM-IG algorithm[J].Systems Engineering—Theory&Practice,2014,34(5):1230-1236.
[3]Anzanello M J,Albin S L,Chaovalitwongse W A.Selecting the best variables for classifying production batches into two quality levels[J].Chemometrics and Intelligent Laboratory Systems,2009,97(2):111-117.
[4]Jeong B,Cho H.Feature selection techniques and comparative studies for large-scale manufacturing processes[J].The International Journal of Advanced Manufacturing Technology,2006,28(9-10):1006-1011.
[5]Yu L,Liu H.Efficient feature selection via analysis of relevance and redundancy[J].The Journal of Machine Learning Research,2004,5:1205-1224.
[6]Robnik-Sikonja M,Kononenko I.Theoretical and empirical analysis of ReliefF and RReliefF[J].Machine Learning,2003,53(1-2):23-69.
[7]Kohavi R,John G H.Wrappers for feature subset selection[J].Artificial Intelligence,1997,97(1):273-324.
[8]Gunal S,Gerek O N,Ece D G,et al.The search for optimal feature set in power quality event classification[J].Expert Systems with Applications,2009,36(7):10266-10273.
[9]Pacheco J,Casado S,Angel-Bello F,et al.Bi-objective feature selection for discriminant analysis in two-class classification[J].Knowledge-Based Systems,2013,44:57-64.
[10]姚旭,王晓丹,张玉玺,等.基于粒子群优化算法的最大相关最小冗余混合式特征选择方法[J].控制与决策,2013,28(3):413-417.Yao X,Wang X D,Zhang Y X,et al.A maximum relevance minimum redundancy hybrid feature selection algorithm based on particle swarm optimization[J].Control and Decision,2013,28(3):413-417.
[11]Moraglio A,Di Chio C,Togelius J,et al.Geometric particle swarm optimization[J].Journal of Artificial Evolution and Applications,2008,2008:1-14.
[12]Huang C L,Dun J F.A distributed PSO-SVM hybrid system with feature selection and parameter optimization[J].Applied Soft Computing,2008,8(4):1381-1391.
[13]Huang C L,Wang C J.A GA-based feature selection and parameters optimization for support vector machines[J].Expert Systems with Applications,2006,31(2):231-240.
[14]邬开俊,鲁怀伟.采用并行协同进化遗传算法的文本特征选择[J].系统工程理论与实践,2012,32(i0):2215-2220.Wu K J,Lu H W.PCEGA used to solve text feature selection[J].Systems Engineering—Theory&Practice,2012,32(10):2215-2220.
[15]Vignolo L D,Milone D H,Scharcanski J.Feature selection for face recognition based on multi-objective evolutionary wrappers[J].Expert Systems with Applications,2013,40(13):5077-5084.
[16]Borgelt C,Gil M A,Sousa J,et al.Towards advanced data analysis by combining soft computing and statistics[M].New York:Springer,2013:359-375.
[17]Xue B,Zhang M,Browne W N.Multi-objective particle swarm optimisation(PSO)for feature selection[C]//Proceedings of the 14th Annual Conference on Genetic and Evolutionary Computation,ACM,2012:81-88.
[18]Su C T,Chen L S,Chiang T L.A neural network based information granulation approach to shorten the cellular phone test process[J].Computers in Industry,2006,57(5):412-423.
[19]Deb K,Pratap A,Agarwal S,et al.A fast and elitist multiobjective genetic algorithm:NSGA-II[J].IEEE Transactions on Evolutionary Computation,2002,6(2):182-197.
[20]Freimer M,Yu P L.Some new results on compromise solutions for group decision problems[J].Management Science,1976,22(6):688-693.
[21]Srinivas N,Deb K.Muiltiobjective optimization using nondominated sorting in genetic algorithms[J].Evolutionary Computation,1994,2(3):221-248.
[22]Han J,Kamber M,Pei J.Data mining:Concepts and techniques[M].3rd ed.Waltham,MA:Morgan Kaufmann,2012:364-377.
[23]John G H,Langley P.Estimating continuous distributions in Bayesian classifiers[C]//Proceedings of the Eleventh Conference on Uncertainty in Artificial Intelligence,Morgan Kaufmann Publishers Inc,1995:338-345.
[24]Hall M,Frank E,Holmes G,et al.The WEKA data mining software:An update[J].ACM SIGKDD Explorations Newsletter,2009,11(1):10-18.
[25]Chen L F,Liao H Y M,Ko M T,et al.A new LDA-based face recognition system which can solve the small sample size problem[J].Pattern Recognition,2000,33(10):1713-1726.