摘要
当前代价敏感算法普遍采用静态的误分类代价,而静态的误分类代价局限性很强。这主要表现在:过度拟合、不能反映数据集真实的类分布特征。针对静态误分类代价的不足,本文首先提出一种动态误分类代价机制。该机制根据不同的测试代价自适应生成四种不同的动态误分类代价函数,并以最小总代价为目标。其次,我们在动态误分类下重新定义了最小总代价的属性选择问题。最后我们提出了一个模拟退火算法解决了该问题。实验结果证明,该设计方案可有效地选出最优误分类代价,以保证所选属性集合具有最小的平均总代价。
The current cost sensitive algorithm generally used static misclassification cost. However, the limitations of the static misclassification cost are so strong, which mainly displays in excessive fitting and cannot reflect the actual class distribution characteristics of the datasets. First, aiming at the shortcomings of the stationary misclassification cost, this paper puts forward a dynamic mechanism of misclassification cost. With this in mind, in order to minimize average total cost, four different misclassification cost functions are adaptively computed according to different test costs. And then, we redefine the minimal total cost feature selection problem on the dynamic misclassification cost. Finally, a simulated annealing algorithm is designed to deal with this problem. The experiment alresults show that the designed algorithm can select the feature set with the optimal misclassification cost, which leads to the lowest average total costs.
引文
[1]Turney P D.Cost-sensitive classification:empirical evaluation of a hybrid genetic decision tree induction algorithm[J].Journal of Artificial Intelligence Research,1995,2:369-409.
[2]Li Y F,Kwok J T,Zhou Z H.Cost-Sensitive Semi-Supervised Support Vector Machine[C].AAAI.2010,10:500-505.
[3]Qian Y,Liang J,Pedrycz W,Dang C.Positive approximation:An accelerator for attribute reduction in rough set theory[J].Artificial Intelligence,2010,174(9):597-618.
[4]Zhang W X,Mi J S,Wu W Z.Knowledge reductions in consistent systems[J].Chinese Journal of Computers,2003,26(1):12-18.
[5]Ziarko W.Variable precision rough set model[J].Journal of Computer and System Sciences,1993,46(1):39-59.
[6]苗夺谦,李道国.粗糙集理论、算法与应用[M].北京:清华大学出版社,2008.
[7]Yao Y.Decision-theoretic rough set models[M].Rough Sets and Knowledge Technology.Springer Berlin Heidelberg,2007:1-12.
[8]Yao Y,Wong S K M.A decision theoretic framework for approximating concepts[J].International Journal of Man-machine Studies,1992,37(6):793-809.
[9]Yao Y,Zhao Y.Attribute reduction in decision-theoretic rough set models[J].Information sciences,2008,178(17):3356-3373.
[10]Cornelis C,Jensen R,Hurtado G.Attribute selection with fuzzy decision reducts[J].Information Sciences,2010,180(2):209-224.
[11]Zhao H,Min F,Zhu W.Test-cost-sensitive attribute reduction based on neighborhood rough set[J].Gr C,2011,802-806.
[12]Zhao H,Min F,Zhu W.Test-cost-sensitive attribute reduction of data with normal distribution measurement errors[J].Mathematical Problems in Engineering 2013,2013:1-12.
[13]Zhao H,Min F,Zhu W.Cost-sensitive feature selection of numeric data with measurement errors[J].Journal of Applied Mathematics,2013,2013.
[14]Zhou Z H,Liu X Y.Training cost-sensitive neural networks with methods addressing the class imbalance problem[J].Knowledge and Data Engineering,IEEE Transactions on,2006,18(1):63-77.
[15]林姿琼,赵红.代价敏感最优误差边界选择[J].计算机科学与探索,2013,7(12):1146-1152.
[16]陈晓林.基于动态代价敏感的机器学习研究[D].华中科技大学,2010.
[17]Min F,Liu Q.A hierarchical model for test-cost-sensitive decision systems[J].Information Sciences,2009,179(14):2442-2452.
[18]Min F,He H P,Qian Y,et al.Test-cost-sensitive attribute reduction[J].Information Sciences,2011,181(22):4928-4942.