面向纳税评估的数据挖掘模型的研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着社会主义市场经济的不断发展完善和税收征管工作的逐步深入,各级税务机关均已积累了相当可观的历史涉税数据,如何从这些数据资料中获得有意义的知识将成为信息化条件下税收征管工作的重要任务。在这样的背景下,本文从税务机关的角度出发,对纳税评估中的数据挖掘模型进行了研究和探讨。首先,在税收遵从理论和微观税收分析理论的框架下对纳税评估为什么做、做什么和怎样做的问题予以界定,为后续的研究奠定了理论基础。其次,面向纳税评估实际应用提出了基于用户细化的数据挖掘过程模型并对其具体内容予以描述,该模型为纳税评估数据挖掘模型的应用提供了参考标准。接下来,论文基于粗糙集属性约简提出动态构建纳税评估指标体系的数据挖掘模型,以增值税税种为例对该模型进行了研究,给出了作为知识系统的纳税评估指标体系的定义。最后,论文构建了粗集—神经网络纳税评估对象识别模型,针对该模型的泛化能力问题和两类错误识别问题,提出了神经网络集成学习的识别模型,结果表明该模型可以有效的提高识别模型的泛化能力并降低两类误识率。论文的研究结果表明纳税评估中的数据挖掘模型有助于提升税务机关的征管水平和征管效果。
With the continuous development of the socialist market economy and thegradual deepening of Chinese tax management, tax authorities of all levelshave accumulated considerable historic tax-related data in a period. How toget meaningful information from these data has become an important task oftaxation under the conditions of knowledge. In this context, academia and theauthorities have started useful researches of tax assessment from aspects ofeconomics, management, information technology, etc.
     The dissertation defines why, what and how to do with problems in theresearch on the tax assessment with the framework of tax compliance theoryand micro revenue analysis theory from the point of view of tax authorities,which lays the theoretical foundation for the follow-up research. Next, thebasic theory of the data mining is expatiated and a process model of data miningtechniques application in the tax assessment is explained. Then a user specificbased process model of data mining in application is put forward after theanalysis of the shortcomings of the existing two process models of data mining.The model provides a reference standard for the application of model of datamining. The dissertation brings forward building the model of data miningdynamically based on reduction of rough set attributes and makes ademonstrative research for the model with the example of VAT and defines thetax assessment index system as a knowledge system. Next, the dissertationconstructs a identification model for tax assessment rough setneural network, taking the index after reduction as the variables input. Theresult of contrast of demonstration research points out the problem ofgeneralization ability and two kinds of problems of identification incorrectlywhile showing the effectiveness of the rough set to the index systemoptimization. In light of these problems, the dissertation puts forward theidentification model of neural network ensemble. The result of demonstrationresearch shows that the model can improve the generalization ability of theidentification model effectively and reduce the rate of two types of error.
     The article putting forward the problem. This chapter sets forth the basic concepts of tax assessment, the course of development before summarizingcurrent problems of the tax assessment management issues, points out thepurpose and significance of the research. Then the dissertation makescomprehensive analysis and comment on current domestic and abroad taxassessment status search, exemplifies the deficiency and the content of theresearch. Finally, it discusses the main content and the main framework.
     This explains basic concepts of data mining, the difference between datamining and knowledge discovery in database and models and approaches of datamining. Then, it analyzed two main process models: technology-oriented Fayyadand application-oriented CRISP-DM,points out problems when applying thesetwo models in tax assessment: the lack of specialization of all data miningtasks results in the deficiency of the process support. Subsequently, thearticle defines the role of users according to the practice of data miningin tax assessment process and tasks of data mining process, putting forwarda user-specific-based process model of data mining, expatiating the content,providing a reference standard for the application of model of data mining.
     Research on the model construction of the index system in tax assessment.The first summarizes the research of index system in tax assessment and itsconstruction, points out three problems of strong subjective empirical, lackof effective methods and a relatively static index system and puts forwardsa new idea to construct the index system by integrating qualitative analysisbases on micro revenue analysis theory with quantitative methods base on datamining. Next to VAT for example, the method based on reduction of rough setattributes to optimize the tax assessment indicator set. From the point ofview of dynamic change of assessed objects, gives the definition of taxassessment index system in the knowledge system, puts forward dynamic indexsystem construction model based on reduction of rough set attributes andexplains its principle and process of dynamic regulation.
     Research on the identification model of tax assessment, constructsrough set-neural network identification model after summarizing andanalyzing the analyzing method of peak value, discriminant analysis, expertsystem and neural network to identify tax assessment object, trains andsimulates it with the actual data from VAT tax assessment. The result ofverifying and analysis shows that the identification effect of the model isnot very satisfactory, lacking a solution to problems of identifying two errors. Next, a 8-2 absolutely vote method to generate conclusion of ensemble learningis put forward based on the method of neural network ensemble learning. Aidentification model of neural network ensemble learning is designed withdemonstration research, with the result that the model enhancesthe generalization ability effectively and reduce the rate ofidentification incorrectly. It is feasible to apply in the identification oftax assessment objects.
     (1) To define why, what and how to do with problems in the research ofthe tax assessment with the framework of tax compliance theory and microrevenue analysis theory from the point of view of tax authorities. (2) Researchthe process model of data mining in tax assessment, and put forward auser-specific-based process model of data mining in tax assessment. (3) Withregard to tax assessment index system, put forward a dynamic construction modelbased on reduction of rough set attributes, give the definition of the taxassessment index system in the knowledge system. (4) To put forward aidentification model based on neural network ensemble to enhancesthe generalization ability of neural network models and reduce the rate ofidentification incorrectly. Finally, the chapter prospect the issue which needfurther research in the future.
引文
[1]马世领,李峻岭. 3000万奖励换回9亿税.国际金融报. 2002-07-31
    [2]国家税务总局.国家税务总局关于进一步加强税收征管基础工作若干问题的意见.国税发[2003]124号,2003.
    [3] Olivia Parr Rud原著,朱扬勇,左了叶,张忠平等译.数据挖掘实践[M].北京:机械工业出版社,2003,66-67.
    [4] Han J.W. Kambe M著,范明等译.数据挖掘:概念与技术.北京:机械工业出版社,2001.
    [5] Ronald J.Brachman and Tej Anand, (1996). The Process of Knowledge Discoveryin Databases: A Human-centered Approach. Adavance In Knowledge DicoveryAnd Data Mining . AAAI/MIT Press.
    [6] Usama Fayyad, Gregory Piatetsky-Shapiro, and Padhraic Smyth, (1996).Knowledge Discovery and Data Mining: Towards a Unifying Framework.Proceedings of Second International Conference on Knowledge Discovery andData Mining (KDD-96), AAAI Press.
    [7]陈文伟,黄金才,赵新晏.数据挖掘技术[M].北京:北京工业大学出版社,2002.
    [8] George H. John, (1997). Enhancements to the Data Mining Process. Ph.Dthesis of Stanford University.
    [9] Ronald J.Brachman and Tej Anand, (1996). The Process of Knowledge Discoveryin Databases: A Human-centered Approach. Advance In Knowledge DiscoveryAnd Data Mining . AAAI/MIT Press.
    [10]朱建秋.数据挖掘应用平台及其关键技术研究[D].复旦大学博士学位论文.2002.
    [11]朱廷劭,高文.数据库中知识发现的处理过程模型的研究[J].计算科学. 1999,2.
    [12]曾珍香,李艳双.复杂系统评价指标体系研究[J].河北工业大学学报. 2001,2.
    [13]袁修久.粗糙集约简理论研究[D].西安交通大学博士学位论文. 2003.
    [14]王国胤. Rough集理论与知识获取[M].西安:西安交通大学出版社,2001年5月.
    [15] Nguyen S H, Skowron A. Quantization of real-valued attributes. In: Journalof Computer and System Science.[78].1995.34-37
    [16] Dougherty J, Kohavi R, Sahami M. Supervised and unsuperviseddiscretization of continuous features. In: Proceedings of 12th ICML. MorganKaufmann publishers,1995.194-202.
    [17] Skowron A, Rauszer C. The discernibility matrixes and functions ininformation systems. In: Slowinski R(Ed.) Intelligent Decision support,Handbook of Applications and Advances of Rough set Theory. Dordrecht:Kluwer Academic Dublishers,1992,331-362.
    [18] Wong S K, Ziarko W. On optimal decision rules in decision tables. Bulletinof Polish Academy of Sciences,1985,33:693-696.
    [19]刘清. Rough集及Rough推理[M].北京:科学出版社,2001.
    [20] Vimerbo S. Predictive Models in Medicine: Some Methods for Constructionand Adaptation[Ph.D thesis]. Tronfheim, Norwegian University of Scienceand Technology, Department of Computer and Information Science,1999.
    [21] Vinterbo S, Ohm A. Minimal approximate hitting sets and rule templates.International Journal of Approximate Reasoning. 2000(25):123-143.
    [22] Wroblewski J. Finding minimal reducts using genetic algorithms. In: WangP P [78].1995-186-189
    [23] Jelonek J et al. Rough set reduction of attributes and their domains forneural networks. Computational Intelligence,1995,11(2):339-347.
    [24]王国胤,于洪,杨大春.基于条件信息嫡的决策表约简[J].计算机学报,2002,25(7).
    [25]苗夺谦,胡桂荣.知识约简的一种启发式算法[J].计算机研究与发展,1999,36(6).
    [26] Bazan J G, Skowron A, Synak P. Dynamic reducts as a tool for extractinglaws from decision tables. In: Ras Z W, Zemenkova M(Ed), Methodologiesfor Intelligent Systems, Proceedings of the Eighth InternationalSymposium, ISMIS'94,1994, 346-355.
    [27]赵漩,史忠植.税务稽查选案专家系统的研究[J].计算机系统应用,1999(10):12-15
    [28]闻新,周露,王丹力,熊晓英.MATLAB神经网络应用设计[M].北京:科学出版社,2000年9月
    [29]魏海坤.神经网络结构设计的理论与方法[M].北京:国防工业出版社,2005年2月.
    [30] McCulloch W S and Pitts W. A logical calculus of ideas immanent in nervousactivity[J]. Bulletin of Mathematical Biophysics,1943(5).
    [31] J. J. Hopfield. Neural Network and Physical Systems With EmergentCollective Computational Abilities[J]. Proceedings of the nationalAcademy of Sciences, U.S.A, 1982,79.
    [32] J. J. Hopfield. Neurons With Graded Responses Have CollectiveComputational Properties Like Those of Two-State Neurons[J]. Proceedingsof the national Academy of Sciences, U.S.A, 1984,81.
    [33]欧邦才. BP神经网络的经济预测方法[J].南京工程学院学报(自然科学版).2004,6第二期.
    [34] Jelonek J, Krawiec K, Slowinski et al. Rough Set Reduction of Attributesand Their Domains for Neural Networks. ComputationalIntelligence,1995,11(2):339-347.
    [35]谭章禄.人工神经网络在矿业中的应用[M].北京:煤炭工业出版社. 1997,12.
    [36]于立勇.我国国有商业银行信用风险评估研究[D].哈尔滨工业大学博士学位论文. 2002,5.
    [37] Hornik KM, Stinchcombe M, White H. Multilayer Feedforward Networks AreUniversal Approximators. Neural Networks, 1989, 2(5): 359~366.
    [38] Hansen L K, Salamon P. Neural Network Ensembles. IEEE Transactions onPattern Analysis and Machine Intelligence, 1990,12(10):993~1001.
    [39] Sollich P, Krogh A. Learning with ensembles: How over fitting can be useful.In: Touretzky D, Mozer M , Hasselmo M eds. Advances in Neural InformationProcessing Systems, Cambridge, MA: MIT Press, 1996. 190-196
    [40] Opitz D, Maclin R. Popular ensemble methods: An empirical study. Journalof Artificial Intelligence Research, 1999, 11:169-198
    [41] Cooper L N. Hybrid neural network architectures: Equilibrium systems thatpay attention. In: Mammone R J, Zeevi Y Y eds. Neural Networks: Theoryand Applications, San Diego, CA: Academic Press, 1991.81-96
    [42] Schapire R E. The Strength of Weak Learnability. Machine Learning,1990,5(2):197~227.
    [43] Breiman L. Bagging Predictors. Machine Learning, 1996,24(2):123~140.
    [44] Freund Y. Boosting a weak algorithm by majority. Information andComputation, 1995, 121 (2):256-285
    [45] Alexey T, Seppo P. Bagging and boosting with dynamic integration ofclassifiers. Principles of data mining and knowledge discovery, 2001,116-125.
    [46] Bauer E, Kohavi R. A n empirical comparison of voting classificationalgorithms: Bagging, boosting, and variants. Machine Learning, 1999,36(1-2):105-139
    [47] Perrone M P, Cooper L N. When networks disagree: Ensemble method for neuralnetworks. In: Mammone R J ed. Artificial Neural Networks for Speech andVision, New York: Chapman & Hall, 1993. 126-142
    [48] Opitz D, Shavlik J. Actively searching for an effective neural networkensemble. Connection Science, 1996,8(3-4):337-353.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700