详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
Data Mining is a new technique developed from 1980s.It aims to extract the implicit, previously unknown, and potentially useful knowledge from voluminous, non-complete, fuzzy, stochastic data. Outliers analysis is a important part of data mining research. Its purpose is to find the "small patterns" from dataset. An outlier is an object that is considerably dissimilar or inconsistent with the remainder of the data. After 20 years of development, on the theory, data mining techniques is becoming more and more consummate and is expanding its application area. Now, data mining has been used in telecom, finance, busyness, weather forecast, DNA, stock market, intrusion detection and customer segmentation etc. So in this paper we first research the algorithm of outlier detection based cell, point out and improve on its shortcomings, then design a system of
    customer loyalty analysis to settle the customer loyalty analysis problem based this algorithm and some other data mining techniques, final, analyze customer loyalty of Haier company based its customer relationship data.
    First, we described the background of research and pointed out its significance. The domestic and foreign situation of data mining research was analyzed from theoretical and applying aspects. After analyzing the general progress of knowledge discovery we gave a classic framework of a data mining system, analyzed main function of every module and expatiate on the technique of data mining.
    Second, The research process and the current situation of outlier detection are reviewed. The algorithms of outlier detection based distance; density, deviation and high dimension are introduced. The content of these algorithms is analyzed. The disadvantages and advantages of these algorithms are compared.
    Third, Based on the algorithm of based-cell outlier detection, an outlier-analysis algorithm to reduce the boundary influence is presented. The data spatial cell partitioning and data object allocating methods based on the problem of boundary outlier misjudgment in cell outlier mining algorithms are discussed. Then a dynamic adjustment function on dataset boundary threshold is defined and an improved algorithm on the cell-based outlier is brought forward. It can greatly reduce the amount of misjudgment on boundary outlier by the algorithm discussed in this paper without increasing the complexity and the calculating time of the original algorithm. The validity of the new algorithm has been verified by some instances. Finally, we used this algorithm in the edge extraction of color
    face images and the effect is satisfying.
    Forth, a customer loyalty analysis system is designed. The definition of customer loyalty is present. The significance to research the customer loyalty is indicated. Then the functions of this system are explained, which include data preprocessing, key customer finding, customer loyalty partitioning. The preprocessing methods of data preprocessing module are discuss. Data mining techniques used in key customer finding module and customer loyalty partitioning module are given out. The algorithms, which we use in these techniques, are depicted. Finally, result visualization module is introduced) which include tow method: parallel coordinates and categorical chart.
    Fifth, the customer loyalty of Haier Company is analyzed by the customer loyalty analysis system. In order to analyze the customer loyalty of Haier Company, the process and way, which we choose and deal the analyzed data, is discussed. The results of the key customer finding is analyzed and compared based different parameters. The rule of parameter change and key customer finding is got. Beside these, the classes of Haier customer data are educed by clustering. Then, the proper data objects are chosen as the train set. The final loyalty classes of customer relationship data are got by the prediction algorithm of neural network. The validity of customer loyalty analysis system is verified.
    Finally, all the results are summarized, and the study prospect is discussed.
[1] P. Adriaans and D. Zantinge, Data Mining. Addion-Wesley :Harlow, England, 1996
    [2] W.H Inomn. Building the Data Warehouse. John Wiley, 1996.
    [3] R.Kimball. The Data Warehouse Toolkit. John Wiley& Sons, New York, 1996.
    [4] S. Chaudhuri and U. Dayal. An owerciew of data warehouse and OLAP technology. ACM SIGMOD Record, 1997.
    [5] Y. kodrato and R. S. Michalski. Machine, An Artificial Intelligence Approach, Vol.3. Morgan Kaufmann,1990.
    [6] P.Langley. Elements of Machine Learning. Morgan Kaufmann, 1996.
    [7] R. S. Michalski, I. Bratko, and M. Kubat. Machine Learning and Data Mining : Methods and Applications. John Wiley & Sons, 1998.
    [8] T. M. Mitchell Machine Learning McGraw Hill, 1997.
    [9] 李焕荣、王树明,一种改进的BP神经网络的预测方法及其研究,系统工程。2000. 9,76-78。
    [10] 张智星、孙春在、水谷英二(日),神经-模糊和软计算,西安交通大学出版社。
    [11] S.M Weiss and C. A. Kulikowski. Computer Systems that Learn: Classification and Prediction Methods from Statistics, Neural Nets Machine Learning, and Expert Systems. Morgan kaufmann, 1991.
    [12] C. M. Bishop. Neural Networks for Pattern Recognition. Oxford, UK:Oxford University Press, 1995.
    [13] G. Cooper and E. Herskovits. A bayesian method for the induction of probability networks from data. Machine Learning., 9:309-347,1992.
    [14] D. Heckerman, D. Geiger, and D. Chickering. Learning Bayesian networks: The combination of knowledge and statistics data. Machine Learning, 20:197,1995.
    [15] C. Elkan. Boosting and naYve bayesian learning. In Technical Report CS97-557, Dept. of Computer Science and Engineering, Univ. Calif. At San Diego, Setp. 1997.
    [16] A. Siberschatz,M. Stonebraker, and J. D. Ullman. Database research: Achievements and opportunities into the 21 st century. Acm SIGMOD Record, 25:52-63, March 1996.
    [17] M. Stonebraker. Readings in Database Systems, 2 ed. Morgan Kaufmann, 1993.
    [18] W. Ziarko. Rough Sets, Fuzzy sets , Fuzzy Sets and Knowledge Discovery. Springer-Verlag, 1994.
    [19] S.M Weiss and N. Indurkhya. Predictive Data Mining. Morgan kaufmann, 1998.
    [20] M.James. Classification Algorithms. New York: John Wiley&Sons,1985.
    [21] R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, and A. Swami. An interval classifier for database mining applications. In proc. 18 th int. Conf. Very Large Data Bases, pages 560-573, Vancouver, Canada, August 1992.
    [22] Rajeev Rastogi, Kyuseok Shim: PUBLIC: A Decision Tree Classifier that Integrates Building and Pruning. VLDB 1998: 404-415
    [23] D. E. Brow, V. Corruble, and C. L. Pittard. A comparison of decision tree classers with backpropagation neural networks for multimodal classification problem. Pattern Recognition,26:953-96 1 , 1993.
    [24] J.R. Quinlan C4. 5: Programs for Machine Learning. Morgan Kaufmann, 1993.
    [25] A.Agresti. An Intruduction to Categorical Data Analysis. John Wiley & Sons 1996.
    [26] 冯天谨,神经网络技术,海洋大学出版社。
    [27] L. Kaufman and P.J.Rousseeuw. Finding Groups in Data: An Introduction to Cluster Analysis.New York: John Wiley & Sons, 1990.
    [28] Ralambondrainy, H. (1995) A Conceptual Version of the k-Means Algorithm, Pattern Recognition Letters, 16, pp. 1147-1157.
    [29] Bobrowski, L. and Bezdek, J. C. (1991) c-Means Clustering with the l 1 and l . Norms, IEEE Transactions on Systems, Man and Cybernetics, 21(3) , pp. 545-554.
    [30] M.Ester,H.P.Kriegel,J. Sander, and X.Xu.A density-based algorithm for discovering clusters in large spatial databases. In Proc. 1996 Int. Conf.Knowledge Discovery and Data Mining(KDD'96) , pages 226-231, Portland, OR,Aug.1996.
    [31] M.Ankerst, M.Breunig, H.P.Kriegel, and J.Sander. OPTICS: Ordering points to identify the clustering structure. In Proc. 1999 ACM-SIGMOD Int. Conf Management of Data(SIGMOD'99) ,page 49-60,Philadelphia,PA,.lune. 1999.
    [32] T.Zhang, R.Ramakrishnan, and M.Livny. BIRCH: An efficient data clustering method for very large database .In Proc. 1996 ACM-SIGMOD int. Conf. Management of Data(SIGMOD'96) ,pages 103-114,Montreal,Canada,June 1996.
    [33] S.Guha,R.Rastogi, and K.Shim. Cure An efficient clustering algorithm for large database. In Proc. 1998 ACM-SIGMOD Int. Conf. Management of Data(SIGMOD'98) , pages 73-84,seattle, WA,June 1998.
    [34] Wang, Wei, Jiong Yang, and Richard Muntz. (1997) STING: A Statistical Information Grid Approach to Spatial Data Mining. Proceedings of the 23 rd Very Large Databases Conference (VLDB 1997) . Athens, Greece.
    [35] T. Kohonen Self-organized formation of topologically correct feature maps. Biological Cybernetics, 43:59-69,1982.
    [36] Raymond T. Ng, Jiawei Han: Efficient and Effective Clustering Methods for Spatial Data Mining. VLDB 1994: 144-155
    [37] D. Hawkins. Identification of Outliers. Chapman and Hall.London, 1980
    [38] V. Barnett and T. Lewis. Outliers in Statistical Data. John Wiley, 3rd edition, 1994.
    [39] E. Knorr and R. Ng. Finding Intensional Knowledge of Distance-based Outliers. VLDB Conference Proceedings, 1999.
    [40] S. Ramaswamy, R. Rastogi, K. Shim. Efficient Algorithms for Mining Outliers from Large Data Sets. ACM SIGMOD Conference Proceedings, 2000.
    [41] E. Knorr and R. Ng. Algorithms for Mining Distance-based Outliers in Large Data Sets. VLDB Conference Proceedings, 1998.
    [42] Edwin M. Knorr, Raymond T. Ng, V. Tucakov: Distance-Based Outliers: Algorithms and Applications. VLDB Journal 8(3-4) : 237-253(2000)
    [43] Sridhar Ramaswamy, Rajeev Rastogi, Kyuseok Shim: Efficient Algorithms for Mining Outliers from Large Data Sets. SIGMOD Conference 2000: 427-438
    [44] M. M. Breunig, H.-P. Kriegel, R. Ng, J. Sander. LOF: Identifying Density-Based Local Outliers. In ACM SIGMOD Conference Proceedings, 2000.
    [45] Markus M. Breunig, Hans-Peter Kriegel, Raymond T. Ng 1 , J6rg Sander OPTICS-OF: Identifying Local Outliers .In Conf of PKDD1999
    [46] Rakesh Agrawal.Prabhakar Ragaran A Linear Method for Deviation Detection in Large Databases. KDD Conference Proceedings, 1995
    [47] C. C. Aggarwal and P. Yu. Outlier Detection for High Dimensional Data. In Proc. of ACM SIGMOD'2001
    [48] M. Dash. And Liu. And J. Yao. Dimensionality reduction of unsupervised data. In Proc. 9 th IEEE intl. Conf. On Tools with AI(ICTAI'97) , pages 532-539,IEEE Computer Science, 1997.
    [49] U. Fayyad and K.Irani. Multi-interval discretization of continuous-values attributes for classification learning. In proc. 13 th intl. Joint Conf. On Arti_cial Intelligence (IJCAI'93) pages 1022-1027,Morgan Kaufmann Publishers, 1993.
    [50] R. Kerber. Discretization of numeric attributes. In Proc. 9 th Natl. Conf. On Artificial Intelligence (AAAI'92) , pages 123-128, AAAI/MIT press, 1992.
    [51] M. W. Craven and J. W.Shavilk. Using neural networks in data mining. Future Generation Computer system. 13:211-229,1997.
    [52] J. MacQueen. Some methods for classification and analysis of multivariate observations. Proc. 5th Berkelely symp. Math. Statist, Prob., 1:281-297,1967
    [53] D. A. Keim. Visual techniques for exploring database. In Tutorial Notes,3rd Int. Conf. On Knowledge Discovery and Data Mining (KDD97) ,Newport Bcach.CA, Aug. 1997.
    [54] Ying-Huey Fua, Matthew O. Ward, and Elke A. Rundensteiner. Hierarchical parallel coordinates for exploration of large datasets. In IEEE Visualization 1999 Proceedings, pages 43-50. IEEE Computer Society, 1999

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700