摘要
属性约简是粗糙集理论研究的一个基本问题,它是一种有效的数据约简方法。然而,目前很多的属性约简算法在面对高维数据集时仍然不够高效。文中利用图论的相关理论和方法,对基于区分矩阵的粗糙集属性约简方法给出了直观和等价的刻画。在此基础上提出了基于图论的粗糙集属性约简方法。实验结果表明,新的属性约简算法在面对较大规模的数据集,尤其是高维的数据集时,不仅能有效地降低数据的维数,同时运行速度快且能保持较高的分类精度。
Attribute reduction is a basic problem in Rough set theory,which is regarded as an effective data reduction method. However,many attribute reduction algorithms are not efficient enough to handle the highdimensional data sets. In this paper,the attribute reduction method with Rough sets based on the discernibility matrix is described intuitively and equivalently by using the theory and method of graph theory. Then,some graph-based approaches for the attribute reduction in Rough sets are proposed. The experimental results show that the new attribute reduction algorithms can not only effectively reduce the dimension of data,but also run fast and maintain high classification accuracy when dealing with large-scale data sets,especially high-dimensional data sets.
引文
[1] PAWLAK Z. Rough sets[J]. International Journal of Computer and Information Sciences,1982,11:341-356.
[2] BANERJEE M,MITRA S,PAL S K. Rough fuzzy MLP:Knowledge encoding and classification[J].IEEE Transactions on Neural Networks, 1998, 9:1203-1216.
[3] WEI W,LIANG J Y. Information fusion in Rough set theory:An overview[J]. Information Fusion,2019,48:107-118.
[4]王国胤,姚一豫,于洪.粗糙集理论与应用研究综述[J].计算机学报,2009,32(7):1229-1246.
[5]黄锦静,陈岱,李梦天.基于粗糙集的决策树在医疗诊断中的应用[J].计算机技术与发展,2017,27(12):148-152.
[6] WONG S K M,ZIARKO W. Optimal decision rules in decision table[J]. Bulletin of Polish Academy of Sciences,1985,33(11-12):693-696.
[7]杨传健,葛浩,汪志圣.基于粗糙集的属性约简方法研究综述[J].计算机应用研究,2012,29(1):16-20.
[8] WANG J,WANG J. Reduction algorithms based on discernibility matrix:The ordered attributes method[J]. Journal of Computer Science and Technology,2001,16:489-504.
[9]杨明.一种基于改进差别矩阵的属性约简增量式更新算法[J].计算机学报,2007,30(5):815-822.
[10] CHEN D G,ZHAO S Y,ZHANG L,et al. Sample pair selection for attribute reduction with Rough set[J]. IEEE Transaction on Knowledge and Data Engineering,2012,24(11):2080-2093.
[11]赵军,王国胤,吴中福,等.一种高效的属性核计算方法[J].小型微型计算机系统,2003,24(11):1950-1953.
[12]徐章艳,刘作鹏,杨炳儒,等.一个复杂度为max(O(|C||U|),O(|C|2|U/C|))的快速属性约简算法[J].计算机学报,2006,29(3):391-399.
[13] QIAN Y H,LIANG J Y,PEDRYCZ W,et al. Positive approximation:An accelerator for attribute reduction in rough set theory[J]. Artificial Intelligence,2010,174:597-618.
[14]苗夺谦,胡桂荣.知识约简的一种启发式算法[J].计算机研究与发展,1999,36(6):681-684.
[15]王国胤,于洪,杨大春.基于条件信息熵的决策表约简[J].计算机学报,2002,25(7):759-766.
[16] LIANG J Y,SHI Z Z. The information entropy,rough entropy and knowledge granulation in Rough set theory[J]. International Journal of Uncertainty,Fuzziness and Knowledge-Based Systems,2004,12:37-46.
[17] ZHANG X,LIU X,YANG Y Y. A fast feature selection algorithm by accelerating computation of Fuzzy Rough set-based information entropy[J]. Entropy,2018,20(10):1-13.
[18] THANGAVEL K,PETHALAKSHMI A. Dimensionality reduction based on Rough set theory:A review[J].Applied Soft Computing,2009,9:1-12.
[19] KULAGA P,SAPIECHA P,KRZYSZTOF S. Approximation Algorithm for the Argument Reduction Problem[M]∥Computer Recognition Systems. Berlin Heidelberg:Springer,2005:243-248.
[20] CHEN J K,LIN Y J,LIN G P,et al. The relationship between attribute reducts in Rough sets and minimal vertex covers of graphs[J]. Information Sciences,2015,325:87-97.
[21] PAWLAK Z. Rough Sets:Theoretical Aspects of Reasoning about Data[M]. Boston:Kluwer Academic Publishers,1991.
[22] HU X H,CERCONE N. Learning in relational databases:A Rough set approach[J]. Computational Intelligence,1995,11:323-338.
[23] SKOWRON A,RAUSZER C. The discernibility matrices and functions in information systems[C]∥Proc of Intelligent Decision Support-Handbook of Applications and Advances of the Rough Sets theory. Dordrecht:Kluwer Academic Publisher,1991:331-362.
[24] MIAO D Q,ZHAO Y,YAO Y Y,et al. Relative reducts in consistent and inconsistent decision tables of the Pawlak Rough set model[J]. Information Sciences,2009,179(24):4140-4150.
[25] BONDY J A,MURTY U S R. Graph Theory with Applications[M]. London:Macmillan,1976.
[26] CHVATAL V. A greedy heuristic for the set-covering problem[J]. Mathematics of Operations Research,1979(3):233-235.
[27] LI J. Scikit-feature feature selection repository[EB/OL]. http:∥featureselection. asu. edu/datasets. php.Arizona State University,2019-03-20.
[28] DUA D,GRAFF C. UCI machine learning repository[EB/OL]. http:∥archive. ics. uci. edu/ml/datasets.php. University of California,2019-03-20.