决策树与神经网络在电信行业中的应用及其对比分析
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
在电信行业中,每天都产生大量的数据,在这些数据中,可能有潜在的信息存在。运用数据挖掘技术对这些数据进行分析,建立数据模型,以提取出有用的、潜在的信息。决策树模型和神经网络模型,作为数据挖掘的两种重要的分类预测模型,已被广泛应用于电信行业领域。
     本文所做的主要工作有:
     第一,介绍和比较三种决策树算法:C4.5(基于信息增益理论而产生的决策树),CART(分类回归树),CHAID(基于卡方检验而产生的决策树),在此基础上提出一个新的想法。
     第二,介绍一种典型的神经网络算法:三层后传神经网络。
     第三,实证分析。首先描述某市电信数据的结构及其具体的组成情况,然后对数据进行抽取、转换、加载等预处理,之后运用CHAID决策树模型和神经网络模型对预处理过的数据进行建模,最后运用评估图对产生的模型进行评估、比较,筛选出比较好的模型,并加以解释。
In the telecom industry, there are large amounts of data.Data mining can extract information from these complex data.As the important methods of data mining, decision tree and neural net have been used in the field of telecom.
     The main works of this paper are as follows:
     Firstly, we analyze three decision tree methods:C4.5,CART (classification and regression tree),CHAID(based on Chi-square test). Then we present a new method.
     Secondly, we introduce a typical neural net method.
     Thirdly we give some examples and analyze the CHAID decision tree model as well as the neural net model.
引文
[1]Domingos, P.,Prospects and challenges for multi-relational data mining.ACM SIGKDD Explorations Newsletter. Jul 2003
    [2]Han, J.,Kamber, M.,Data mining:Concepts and Techniques. New Yord: Morgan-Kaufman.2000
    [3]陈安,陈宁,周龙骧等.数据挖掘技术及应用,科学出版社,2006
    [4]Mantel, N.Evaluation of survival data and two new rank order statistics arising in its consideration.Cancer Chemotherapy Reports,1966:163-170
    [5]Pregibon, D. Data Mining.Statistical Computing and Graphics,1997:7,8
    [6]Abraham, B.,Ledolter,J.Statistical methods for forecasting. New York:Wiley.l983
    [7]Berry, M.J.A.,Linoff, G..S.Mastering data mining. New York:Wiley.2000
    [8]Agresti, A. An Introduction to Categorical Data Analysis. New York:Wiley.1996
    [9]Westphal,C.,Blaxton,T. Data mining solutions.New York:Wiley.1998
    [10]Kolate,G.The proper display of data. Science,1984:156-157
    [11]Brown, R.G.Statistical forecasting for inventory control.New York: McGrw-Hill.1959
    [12]Coombs, C.H. A theory of data. New York:Wiley.1964
    [13]Makridakis,S.G. Forecasting, planning, and strategy for the 21st central. London:Free Press.1990
    [14]Dodge,Y..Analysis of experiments with missing data. New York:Wiley. 1985
    [15]David R. Hardon, J.Shawe-Taylor, J. Convergence analysis of Kernel Canonical Correlation Analysis:theory and practice. Machine Learning.2009
    [16]Box,G.E.P., Cox, D.R.,An analysis of transformations. Journal of the Royal Statistical Society,1964:211-253
    [17]Kalbfleisch, J.D.,Prentice, R. L. The statistical analysis of failure time data. New York:Wiley.1980
    18]Leonard A. Breslow, Pdavid W.A. Simplifying decision trees:A survey. The Knowledge Engineering Review. Jan 1997
    [19]Breiman L.,Friedman, J. H.,Olshen, R. A.,Stone C.G.,Classification and Regression Trees[M];Wadsworth International Group:Belmont, CA,1984
    [20]Olaru, C.,Wehenkel. P.,A complete fuzzy decision tree technique. Fuzzy Sets and Systems. Sep 2003
    [21]Mantel, N.,Haenszel,W.Statistical aspects of the analysis of data from retrospective studies of disease. Journal of the National Cancen,Institute,1959: 719-748
    [22]Garcia, P.,Ayguade, E.,Lebarta, J.,A novel approach towards automatic data distribution. Proceedings of the 1995 ACM/IEEE conference on Supercomputing (CDROM). Dec 1995
    [23]Berry, M. J.A.,Linoff,G.S.,Data Mining Techniques, John Wiley & Sons, Inc.1997
    [24]Cox, D.R. The analysis of binary data. New York:Halsted Press.1970
    [25]Li, L. J., Huang, Z.B.,LIU,F.,A heuristic particle swarm optimization method for truss structures with discrete variables. Computers and Structures. Apr 2009
    [26]Cai, J.F.,Durkin, J.,Decision tree pruning using expert knowledge. Doctoral Theis:University of Akron. Jan 2006
    [27]Steiger, J.H.,Shaprio,A.,Browne, M.W.,On the multivariate asymptotic distribution of sequential chi-square statistics.1985
    [28]Kim,S.,Henderson,G.,The mathematics of continuous-variable simulation optimization. Proceedings of the 40th Conference on Winter Simulation. Dec 2008
    [29]Tukey, J.W.,The future of data analysis. Annals of Mathematical Statistics, 1962:1-67
    [30]Johnson, N.L.,Kotz, S.,Continuous univariate distributions.1970
    [31]Steiger,J.H.,Shaprio,A.,Browne, M.W.,On the multivariate asymptotic distribution of sequential chi-square statistics.1985
    [32]Frederick, J.Programming combined discrete-continuous simulation models for performance. Proceeedings of the 28th conference of Winter simulation., Nov 1996
    [33]He,X. A chi-square test for dimensionality with non-Gaussian data. Journal of Multivariate Analysis. Jan 2004
    [34]Jacobs, D.A.H.(Ed.).The state of the art in numerical analysis. London:Academic Press.1977
    [34]Malinvaud, E.,Statistical methods. Amsterdam:North-Holland Publishing Co.1970
    [35]Milliken, GA.,Johnson, D.E. Analysis of messy data:Vol.Ⅰ.Designed experiments.New York:Van Nostrand Reinhold,Co.1984
    [36]Manuel J.Multidocument summarization:An added value to clustering in interactive retrieval.ACM Transactions on Information Systems(TOIS).Apr 2004
    [37]Johnson, S.C.,Hierachical clustering schemes.Psychometrika.1967
    [38]Brykalov, S.A.,Latushkin.Y.A.,On continuous strategies of deviation from a nonconvex set under uncertainty conditions. Automation and Remote Control.Nov 2007
    [39]Carling, A. Introduction Neural Networks.Wilmslow, UK:Sigma Press. 1992
    [40]Hamm, L.Brorsen, B.W. Comparison of Stochastic Global Optimization Methods to Estimate Neural Network Weights. Neural Processing Letters.Dec 2007
    [41]Speckt,D.R.,Probabilistic Neural Networks.Neural Networks 3 (1),1990:109-118
    [42]Shepherd,A.J.,Second-Order Methods for Neural Networks. New York: Springer.1997
    [43]Haykin,S.,Neual Network:A Comprehensive Foundation.New York:Macmillan Pubishing.1994
    [44]Fayed, M.,Krapivssky. P.,On the emergence o highly variable distributions in the autonomous system topology. ACM SIGCOMM Computer Communication Review. Apr 2003
    [45]Jimenez, M.,Bilbao, A.,Pareto-optimal solutions in fuzzy multi-objective linear programming. Fuzzy Sets and Systems. Sep 2009
    [46]Brownlee, K.A.Statistical Theory and Methodology in Science Engineering.New York:John Wiley.1960
    [47]Kullback,S.,Information theory and statistics.New York:Wiley.1959
    [48]Pfug, L.A.,Jackson. P.M.,Moment Analysis of Ambient Noise Data Dominated by Local Shipping. Proceedings of the 8th IEEE Signal Processing Workshop on Statistical Signal and Array Processing, Jun 1996
    [49]Nelson, W.,Accelerated testing:Statistical models, test plans, and data analysis.New York:Wiley.1990
    [50]Barnard, G. A. Control charts and stochastic processes.Journal of the Royal Statistical Society, Ser. B,1959:239
    [51]Cleveland, W.S.,Graphs in scientific publications. The American Statistician, 1984:270-280
    [52]Lipson, C.,Sheth, N.C.,Statistical design and analysis of engineering experiments, New York:McGraw-Hill.1973
    [53]Ralston, A., Wilf, H.S.(Eds.).Mathematical methods for digital computers. New York:Wiley.1960
    [54]Kennedy, W.J.,Gentle, J.E.Statistical computing. New York:Marcel Dekker, Inc 1980
    [55]Lucas,J.M.The design and use of cumulative sum quality control schemes.Journal of Quality Technology,1976:45-70
    [56]Lloyd, D.K.,Lipow, M.,Reliability:Management,method,and mathematics. New York:McGraw-Hill.1977
    [57]Bentiler, P. M.,Structural modeling and Psychometrika:A historical perspective on achievements.Psychometrika,1986:35-51
    [58]Chambers, J.M.,Cleveland, W.S.,Kleiner, B.,Graphical methods for data analysis. Bellmont,CA:Wadsworth.1983
    [59]Page, E.S.,Cumulative sum charts. Technometrics,1961:1-9
    [60]Johnon,N.L.,Leone,F.C.,Cumulative sum control charts-mathematical principles applied to their construction and use. Industrial Quality Control,1962: 15-21
    [61]Cui, H.,Chen. S.X.,Emprirical likelihood confidence region for parameter in the errors-in-variables models. Journal of Multivariate Analysis. Jan 2003
    [62]Hubert J.,Li, H.-L.,Optimal confidence interval for the largest mean of correlated normal populations and its application to stock fund evaluation. Computational Statistics & Data Analysis.Jun 2008

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700