基于BP神经网络的属性选择研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
数据挖掘是一门从大规模的数据中提取有用信息的技术,数据预处理是数据挖掘任务过程中一项重要的环节,特别是挖掘海量高维数据的信息时数据预处理就显得非常重要。因为通常用于数据分析的数据可能包含数以百计的属性,其中很多属性与数据挖掘不相关,因此通过属性选择找出最小的属性集来有效提高数据挖掘的效率就显得格外重要。
     而分类数据挖掘有很多的挖掘工具,其中之一就是神经网络,其中以BP神经网络最为常用。但现有的神经网络属性选择方法存在不足之处,因为神经网络这种学习型的算法本身的效率就不太高,而如果我们采用数据集全部的属性对神经网络进行训练和裁剪的话,就会使神经网络的网络规模过大,输入的训练信息量过多,网络学习效率低下等等。为了克服神经网络属性选择的缺陷,就必须提出新的方法以对现有的方法加以改进。
     本文提出一种改进的神经网络属性选择方法,该方法结合了属性选择模型中Wrapper模型和Filter模型的优点,这种方法能有效改善BP神经网络属性选择方法的不足,加快BP神经网络预测的效率,提高网络的分类预测准确率。文中首先用敏感度分析法对初始属性集中的属性进行排序,然后根据属性排序的结果,通过逐一剔除次要属性,来比较在剔除次要属性后BP神经网络预测和分类的准确率,最后通过比较在不同情况下的准确率结果,找到最小最优属性集。最后使用MATLAB进行了相关的仿真实验,比较属性选择前后的神经网络的分类准确度和效率,仿真的结果表明该方法效果良好。
Data mining is a technology which can abstract useful information from large-scale dataset, data pre-processing is an important link of data mining process, especially in high dimensional data mining. Usually, the data used for data analysis may contain hundreds of features, and many of them are not relevant to data mining. Therefore, it is particularly important to find out the minimum set of features to effectively improve the efficiency of data mining.
     There are many data mining tools of classification, one of which is neural networks, and BP neural networks used in data mining of classification most commonly. However, there are many defects exist in the methods of neural networks feature selection, because the efficiency of learning algorithm of neural networks is not too high itself, and if we adopt the total features of dataset to train a neural networks, the scale of the network will be very large, the information of the network will be very huge, the studying and predicting efficiency of the network will be bad. In order to overcome the deficiencies of neural networks feature selection, we need to propose a new approach to improve the existing methods.
     An improved neural networks features selection method is presented in the paper, it combines the advantages of Wrapper model and Filter model, and this approach can improve the defects of BP neural networks, which speed up the prediction efficiency of BP neural networks and enhances the prediction accuracy of networks. It ranks the initial features set by using the method of sensitivity analysis, and then we removes the secondary features according to the features ranking results to compare the accuracy of BP neural network prediction and classification between before and after removing the secondary features, at last we can get the minimum set of feature set by making a comparison of the prediction in different situations .The simulation results which based on the MATLAB tool show the efficiency of this approach.
引文
[1]Jiawei Han,Micheline Kamber.数据挖掘概念与技术[M].范明,孟小峰,等译.北京:机械工业出版社,2006.
    [2]王琪.软件质量预测模型中的若干关键问题研究[D].上海:上海交通大学,2006,12.
    [3]王继成,黄源,武港山,张福炎.一种两阶段的神经网络属性选择方法[J].广西师范大学学报:自然科学版,2003,21(1):41-45.
    [4]刘永军.大数据集的属性选择算法的研究与实现[D].沈阳:东北大学,2005,1.
    [5]聂晓颖.果蝇鸣声特征提取及人工神经网络分类研究[D].西安:陕西师范大学,2007,4.
    [6]彭高辉,王志良.数据挖掘中的数据预处理方法[J].华北水利水电学院学报.2008,29(6):61-63.
    [7]文专.基于神经网络的分类数据挖掘属性选择和规则抽取研究[D].天津:天津大学,2004.
    [8]朱磊.基于BP神经网络的软件可靠性模型选择研究[D].重庆:重庆大学,2006,10.
    [9]蔡毅,邢岩,胡丹.敏感度分析综述[J].北京师范大学学报:自然科学版,2008,44(1):9-14.
    [10]Iker Gondra.Applying Machine Learning to Software Fault-proneness Prediction[J].The Journal of System and Software,2008,81:186-195.
    [11]高贤维,刘三民,王杰文.基于遗传算法和神经网络的特征提取[J].计算机与现代化.2008,4.
    [12]俞文彬,谢康林,张忠能.基于属性分类的数据挖掘方法[J].小型微型计算机系统.2000,21(3):305-308.
    [13]刘辉.数据挖掘中约简技术与属性选择算法的研究[D].长春:吉林大学,2006,5.
    [14]王娟,慈林林,姚康泽.特征选择方法综述[J].计算机工程与科学.2005,27(12):68-71.
    [15]M.Dash and H.Liu.Feature selection methods for classifications[J].Intelligent Data Analysis:An International Journal,1997,1(3).
    [16]文专,王正欧.一种高效的基于排序的RBF神经网络属性选择方法[J].计算机应 用.2003,23(8):34-36.
    [17]Xiuju,Lipo Wang.Rule:Extraction Based on Data Dimensionality Reduction Using RBF Neural networks[A].ICONIP 2001 proceedings,8th International Conference on Neural Information processing.Shanghai,China.2001,1:149-153.
    [18]Zhang.G.P,Neural networks for classification:a survey.IEEE Trans.on Systems,Man,and Cybernetics-part b,2000,30(1),451-462.
    [19]庞新生.缺失数据处理中相关问题的探讨叨.统计与信息论坛,2004,9:29-32.
    [20]商琳,王金根,姚望舒,陈世福.一种基于多进化神经网络的分类方法[J].软件学报.2005,16(9):1577-1583.
    [21]叶少珍,张钹,吴鸣锐,郑文波.一种基于神经网络覆盖构造法的模糊分类器[J].软件学报.2003,14(3):429-434.
    [22]陈景年,黄厚宽,田凤占,小平.一种基于特征选择的不完整数据分类方法[J].计算机工程与应用.2007,43(31):23-24.
    [23]李仁璞,分类数据挖掘中若干基本问题的研究[博士学位论文],天津:天津大学,2003.
    [24]H.Liu and H.Motoda.Feature Selection for Knowledge Discovery and Data Mining[M].Boston:Kluwer Academic Publishers,1998.
    [25]谢立宏,贺贵明等.面向属性的归纳与概念聚类[J].计算机工程与设计,2002,10:76-78.
    [26]李仁璞,王正欧.一种结构自适应的神经网络特征选择方法[J].计算机研究与发展,2002,12:1613-1617.
    [27]陈京民.数据仓库与数据挖掘技术.北京电子工业出版社.2002,8.
    [28]刘云霞.数据归约的统计方法研究及应用[D].厦门:厦门大学,2007,5.
    [29]丛爽.面向MATLAB工具箱的神经网络理论与应用.合肥.中国科技技术大学出版社.2003.
    [30]H.B.Burke.et.al Artificial neural networks improve the accuracy of cancer survival prediction.Cancer.Vol.79 1997.857-862.
    [31]Kwak N,C-h.choi.Input Feature Selection for Classification Problem[J].IEEE Tran.on Neural Networks,2002,13(1):143-159.
    [32]J.Sietsma,R.Dow,Creating artificial neural networks that generalize.IEEE Tran.on Neural Networks.1991,4.67-79.
    [33]Engelbrecht AP.A New Pruning Heuristic Based on Variance Analysis of Sensitivity Information[J].IEEE Trans.on Neural Networks,2001,12(6):1386-1399.
    [34]Asuncion,A.& Newman,D.J.(2007).UCI Machine Learning Repository[DB/OL]http://www.ics.uci.edu/~mlearn/MLRepository.html.Irvine,CA:University of California,School of Information and Computer Science.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700