联网审计中增量数据处理技术的研究与应用

英文题名：Research and Application on the Incremental Data Processing Technology in Network Audit
作者：郎大鹏
论文级别：硕士
学科专业名称：计算机应用技术
中文关键词：社会保障审计 ; 异构数据库 ; 增量数据
英文关键词：Social insurance audit ; Heterogeneous Database ; Incremental data
学位年度：2009
导师：黄少滨
学科代码：081203
学位授予单位：哈尔滨工程大学
论文提交日期：2009-02-20

摘要

随着社会保障计算机审计的逐渐铺开,由于数据量的不断增大,审计方案的不断改进等因素,在目前和将来的应用中,对于增量数据的处理、存储、识别以及传输,将逐渐成为国内外学者关注的热点领域。而联网审计工作需要打破地域限制,通过远端部署就能够开展审计工作;同时为了能及时了解社保数据不同阶段的变化情况,以便预测审计结果,增量数据的提取与识别更成为一个不容忽视的问题。
     本文结合已有的国内外对于增量数据的提取技术,如基于数据库自带的日志解析法、时间戳法、基于快照法、基于触发器法、基于API法以及基于变更轨迹表法等方法,提出了基于多线程技术的改进型影子表法(multi-threadbased improved shadow table),适用于多种类型的数据库中的增量数据提取。根据该方法,本文初步实现了增量数据处理系统,包括:数据转换模块、核心对比模块、指令设置模块以及结果显示模块四部分。在充分考虑了提高系统的可扩展性的前提下,采用java语言实现,增量数据、设置参数传输采用txt文本格式,允许系统管理员和审计人员对系统进行设置;同时,为了适应审计业务的需要,本增量提取系统可以由审计人员任意选择增量提取的属性,并可根据具体情况停止增量识别与提取;审计人员、软件的所有操作将被记录到日志中。
     最后,本文利用黑龙江省某市的真实数据,将本软件部署于审计机关的设备上,对本方法进行了测试,基本满足了联网审计业务的要求,实现了高效的信息交换并增加了审计灵活性。
As the application of computer audit of social security in China is spreading out, because of the increasing amount of data, the continuous improvement of audit program and other factors, the incremental data processing, storage, recognition and transmission in the current and future applications will gradually become a hot area of the scholars' concern at home and abroad. The audit based on network needs to break the geographical restrictions to be able to carry out audit work through remote deployment; meanwhile, in order to know the social security data changes in different stages for predicting the outcome of the audit, the gaining and identifying of incremental data have become a problem that cannot be ignored.
     In this thesis, with the combination of the incremental data gaining technologies existing both at home and abroad, such as analytical method based on build-in logs in database, time stamp , the snapshot, trigger , API, as well as change trace table, a method was put forward the based on multi-thread based improved shadow table, which can be used to gain all kinds of the incremental data in the database. According to the method, the incremental data processing system was initially implemented, including four parts: data conversion module, the core comparison module, instruction set module and the results display module.Given full consideration to improve the system scalability, the system was implemented by Java, using text format for the transmission of incremental data and configure parameters in order to allow system administrators and auditors to set up the system; meanwhile, for adapting to the needs of audit business, the incremental gaining system can allow the audit staff to arbitrarily select incremental extraction properties, and can stop the incremental identification and extraction in accordance with specific circumstances; All operations of the audit staff and software will be recorded in the log.
     Finally, by deploying the software on the equipment of the audit institutions, this method has been tested by using the real data from a city in Heilongjiang Province. It basically meets the needs of the requirements of the on-line auditing, implements efficient information exchange and increase the flexibility of audit.

引文

[1]鲍永刚,张英福,王德高.SQL语言及其在关系数据库中的应用.北京:科学出版社,2007
    [2]Li Gui,Yin Chaowan,Zhu Tian and Zheng Huaiyuan.An extensible view method for multi data base integration and interoperation.Computer research & development.35(7):628-633P,1998
    [3]Cortes C.,Fisher K.,Pregibon D.etal.Hancock:A language for extracting signatures from datastreams.In Proc.ACMInt.Conf.on Knowledge Discovery and Data Mining,2000:9-17P
    [4]CranorC.,GaoY.,JohnsonT.etal.GigaScope:high performance network monitoring with an sql interface.In Proc.ACM Int.Conf.On Management of Data,2002:623P
    [5]Sullivan M.,Heybey A.Tribeca:a system form an aging large databases of network tra-c.InProc.USENIX Annual Technical Conf,1998.
    [6]石爱中.初释数据式审计模式.审计研究.2005,04:3-6页
    [7]陈峰.联网审计模式初探.中国审计.2003,04:63-64页
    [8]曹洪泽.联网审计及关键技术研究.北京理工大学学报.2006,07
    [9]国家高新技术研究发展计划(“863”计划)-计算机审计数据采集与处理技术
    [10]审计署金审一期重点项目-海关联网审计研究.2006,07
    [11]顾晓安,候建明.专家系统在审计实务中的应用.中国注册会计师.2002,1:42-44页
    [12]王欣,左春.企业级数据复制平台的构建方案.计算机工程与应用.2003,3:198-200
    [13]者敬.开放式异构数据库复制框架的研究与实现.北京:中国科学院软件研究所,2002
    [14]刘伟,件俐鹃.“变更轨迹表法”的一个改进方案.计算机应用研究,2004,9:537-538页
    [15]王宁,陈滢俞,本权,徐宏炳,王能斌.一个基于CORBA的异构数据源集成系统的设计.软件学报,第9卷,第5期,378-382页,1998-05页
    [16]周龙骧.分布式数据库管理系统实现技术.北京:科学出版社,1998:58-67页
    [17]Zhu Y.Shasha D.Statstream:Statistical monitoring of thousands of data streams in real time.In Proc.Of VLDB,2002:13-45P
    [18]Madden S.,Franklin M.J.Fjording the stream:anarchitecture for queries over streaming sensor data.In Proc.Of ICDE,2002:555-666P
    [19]张正明,佟俐鹃.异构数据库集成的研究与实现.航空计算技术,2004.6
    [20]Syware'Web Site.DataSync,a white paper an database replication and synchronization,available from http://www.syware.com/datasync/newdsync/datasync.htm,1998
    [21]汤莉.海量文件的远程处理系统的设计与实现.2005.12
    [22]唐土生.利用Oracle数据库快照实现数据双向复制.福建电脑,2006.5
    [23]吴大强,孙亚民.基于数据库网关的数据集成.计算机应用研究,2001,12:120-124页
    [24]A.Si.C.Ying,and D.McLeod.On Using Historical Update Information for Instance Identification in Federated Databases.In Procedings of International Conferenceon Cooperative Information Systems,1996:68-77P
    [25]Lars-Ola G.sterlund,Component Technology,IEEEE Computer Applications in Power.2000,1(13)
    [26]柴晓路.Web服务带来了新集成.中国计算机报,2002.2-24页
    [27]IBM developer works,web services architecture review,2000,9
    [28]Art Taylor.JDBC数据库编程与J2EE.李东升译.北京:电子工业出版社,2004
    [29]Monge AE.An adaptive and efficient algorithm for detecting approximately duplicate database records[EB/OL].http://citeseer.Nj.nec.com/monge00adaptive.html.2003-03
    [30]史晶波.在DB2中提取增量数据的一种方法.计算机与数字工程.2004,32(6):15-16页
    [31]Kathy Sierra,Bert Bates.Head First Design Patterns.O'REILY.2005-11:236P
    [32]Richard J.Nominal.王海涛译.Oracle 9i性能调整.北京:清华大学出版社,2004
    [33]鲍永刚,张英福,王德高.SQL语言及其在关系数据库中的应用.科学出版社.2007
    [34]游珍,薛锦云.HANOIT塔非递归算法的形式化推导和正确性验证.计算机研究与发展.2008,143-147页
    [35]彭立军,杨孝如.SYBASE数据库系统管理指南.武汉:水利水电出版社,1998.4
    [36]Boyer R S,Moore J S.A fast string searching algorithm.Communications of ACM,1977,20(10):762-772P
    [37]Hume A.,Sunday D M.Fast string searching.Software2Practice &Experience,1991,21(11):1221-1248P
    [38]Java Threads,2nd edition Scott Oaks & Henry Wong 2nd Edition January 1999 ISBN:1-56592-418-5,332P
    [39]Preslan KW,Barry AP,Brassow J.Scalability and failure recovery in a Linux cluster file system.In:Salus P,ed.Proceedings of the 4th Annual Linux Showcase and Conference.Atlanta:ACM Press,2000.8-10P
    [40]尚展垒,陈慧,宋于伟.一种改进的查询优化技术一分裂大表.郑州轻工业学院学报.2002(9):62-63页
    [41]Frank M.Carrion.数据结构与算法分析.北京:清华大学出版社,2007.12:22-86页

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700