基于中日韩的多国语言编码系统的研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
目前计算机取证的重点主要集中在简体中文操作系统中。随着国际交流的加强和我国改革开放的深入,以及涉及外国人的计算机案件量的不断增加。各省各地市缺乏精确高效的多语言后处理手段,使得涉及多语言的电子数据后处理工作诸多不便。当遇到此类案件的时候,往往让调查人员花费大量的时间。同时由于各自使用的手段方法不统一,也难于将各自的应用技术或方法推广与普及。
     本文研究的系统正是针对上面的一些问题,指出国内的信息分析系统和取证系统的不足,对中、日、韩三国的计算机常用编码规则进行深入分析与研究。通过给出取证搜索关键字在计算机中的可能的16进制编码的方式,让调查人员可以利用标准的计算机取证软件对16进制编码进行关键字搜索,从而实现案件线索的快速定位。为取证工作简化工作模式,在提高现有的工作质量前提下提高工作效率。
     本文首先阐述了基于中日韩的多国语言编码系统的重要意义,介绍了国内外计算机取证调查分析的现状;接着,详细研究了中日韩三国所涉及到的各种编码;然后,对这些编码的内容、特点、编码/解码原理及方法进行了深入的分析:本文以Unicode为核心编码,深入研究了中文的GB2312、BIG5、GBK等,日文的JIS、SHIFT_JIS、EUC-JP等,韩文的EUC-KR等本地编码规则,以及写字板的rtf,邮件的base64与QP编码,UTF-8等二次编码规则和Unicode编码,本系统支持直接批量导出为重要取证软件EnCase的关键字搜索脚本:然后,介绍了系统设计、运行环境、和具体实现的基本情况等;最后,对本系统的研究开发情况进行了总结,并就开发中的不足之处提出了下一步的努力方向。
     经过测试,本文研究的基于中日韩的多国语言编码系统能有效的提高取证人员的工作效率,扩大取证搜索的范围,为有力打击计算机犯罪活动提供了很好的应用工具。本系统还可用于纠正乱码,增强国际信息交流等多方面。同时该系统得到了很好的推广应用,具有较高的社会意义和实用价值。
At present, the computer forensics put it's emphases on Simplified Chinese in windows. Along frequently international communication and embedded reform and open policies, the cases about foreigners are increasing continuously. Because of the lack of precise multiplicate language tool to deal with, the investigators have to spend much time on the disposal of computer data. At the same time, to use the different methods make the technics difficult to spread.
     For the questions above, this paper analysed the shortage of information analysis and computer forensics system, and researched the computer code regulations in China, Japan, and Korea. Through giving the hex code of possible key words for search, let the investigators use standard software for computer forensics to search the key words easily and fleetly. So, it can predigest the work of forensics, and improve the work efficiency under the premise of work quality.
     This paper described the significance of the multi-language system based on Chinese, Japanese, and Korean firstly, introduced the island and overseas status quo of computer forensics. Then, it traversed most kind of codes which the computers in China, Japan, and Korea refer to. Then, used Unicode as kerneled code, lucubrated to the local codes, such as GB2312, BIG5, GBK, etc. in Chinese, JIS, SHIFT_JIS, EUC-JP, etc. in Japanese, EUC-KR, etc. in Korean; the quadratic codes such as rtf document, base64 and QP code; and the Unicode. The system sustain to export a batch EnCase' scrips of key words search. Then the paper introduced the design of system, the environment to run, and basic circs to implement concretely. Finally, summarized the system development and on the lack propose the next step to make efforts.
     Through testings, the system discussed in paper can effectively improve the work efficiency, enlarge the range of forencis search, and provide a good tool for striking computer crimes. It also can correct unreadable codes, enhance the communication of international information. It has been generalized widely, and has high significance for society, and applied value.
引文
[1]Ken Lunde,O'REILLY.CJKV Information Processing[M],2000.
    [2]Chris Prosise,Kevin Mandia.Incident response & computer forensics[M],2003.
    [3]Akhtaruzzaman,M.Unicode searching algorithm using multilevel binary tree applied on Bangla Unicode[C].//Sobh T.International Conference on Systems,Computing Science and Software Engineering,Univ Bridgeport:IEEE Press,2007.321-326.
    [4]Park SC,Lo EH,Park JC.A Korean search pattern in the like operation[C].//Cardoso J,Cordoso J.9th International Conference on Enterprise Information Systems,Funchal:ICEIS Press,2007.457-464.
    [5]International Organization for Standardization.ISO/IEC 10646:Information technology-Universal Multiple-Octet Coded Character Set(UCS)[M],2003.
    [6]The Unicode Consortium.The Unicode Standard:Worldwide Character Encoding,Version 1.0[M],Addison-Wesley,1991.
    [7]The Unicode Consortium.The Unicode Standard:Worldwide Character Encoding,Version 4.0[M],Addison-Wesley,2003.
    [8]Microsoft Rich Text Format(RTF) Specification Version 1.8[EB/OL].http://support.microsoft.com /kb/922681/,2006-06-18.
    [9]A short overview of ISO/IEC 10646 and Unicode[EB/OL].http://www.nada.kth.se/i18n/ucs/unicode -iso10646-oview.html
    [10]Unicode Technical Reports[EB/OL].http://www.unicode.org/unicode/reports/.
    [11]RFC2279,UTF-8,a transformation format of ISO 10646[EB/OL].http://rfc.net/rfc2279.html.
    [12]RFC1641,Using Unicode with MIME[EB/OL].http://rfc.net/rfc1641.html.
    [13]宋秋贵,岳峰,石正海,王锦玉.一种应用Unicode设计多语言实时切换应用程序的解决方案[J].电脑开发与应用,2006,19(3):53-55.
    [14]李志伟.基于MIME邮件自动收发系统的实现[J].计算机应用与软件,2007,24(4):118-120.
    [15]孙涛.MIME邮件格式分析及信息提取[J].计算机与信息技术,2007,(06):24-26.
    [16]马玉春.数据表示与转换[J].电脑编程技巧与维护,2007.1:35-40.
    [17]李宁.跨平台编码转换[J].电脑编程技巧与维护,2007.2:9-17.
    [18]李玉龙.计算机取证技术的探讨与研究[J].计算机安全,2007,05:7-9.
    [19]谭敏,胡晓龙,杨卫平.计算机取证概述[J].网络安全技术与应用,2006,(12):75-77.
    [20]钟秀玉.计算机取证技术探讨[J].现代计算机,2005,(01):46-49.
    [21]张晓培,李祥.从Unicode到GBK的内码转换[J].微计算机应用,2006,27(6):757-759.
    [22]鹿文鹏,薛若娟.Unicode与UTF-8编码转换方法研究[J].计算机时代,2005,9:44-45.
    [23]亓莱滨.Unicode内码转换与汉字乱码[J].电脑知识与技术,2006:158-160.
    [24]邱发林,李伟,周绍景.Unicode及中文到Unicode转换[J].科技信息,2006,3:20-21.
    [25]杨林,刘正光.RTF在中日文数据库软件开发中的应用[J].微处理机,2005,(10):39-41.
    [26]孟庆余.汉字编码字符集的新标准-GB18030-2000[J].微型机与应用,2000(12):4-6.
    [27]陈志成,何华灿,毛明毅.GB18030字库的解读与压缩封装程序设计[J].计算机工程与应用,2002.18:119-129.
    [28]黄宏.J2ME字符集编码转换包研究及中文转换包实现[D].南京大学硕士论文,2004.
    [29]吴刚.数据编码在字符内码转换中的应用[D].西安交通大学硕士论文,2001.
    [30]袁径三.浅说汉字编码[J].绍兴文理学院学报,2005,25(9):56-59.
    [31]陈小瀚.中文编码原理及其乱码问题的探讨[J].科技信息,2007,24:397-465.
    [32]刘援朝.电脑的多文种支持技术与我国少数民族传统文字问题[J].贵州民族研究,2002,22(4):165-173.
    [33]于森,孙睿.计算机取证综述[J].北京联合大学学报,2007,21(2):49-53.
    [34]王倩倩,严莉莉,张燕平.EML格式解析及其访问实现[J].计算机技术与发展,2007,17(7):67-69.
    [35]蒋曼芳,夏保琴,郭静.用多种语言实现短消息中的中文和UNICODE之间的转换[J].信息技术,2005,2,23-25.
    [36]唐武生,田立红,曹伟.Base64编码的实现与应用研究[J].长春大学学报,2006,16(2):69-72.
    [37]Unicode编程入门[EB/OL].http://www.vckbase.com/document/viewdoc/?id=642.
    [38]陈训逊,方滨兴,李蕾.MIME解码算法优化问题研究[J].计算机应用,2003,23:263-265.
    [39]张震,张曾科.用过滤器实现Web网站汉字简繁自动转换[J].中文信息学报,2000,15(1):53-58.
    [40]陈龙,麦永浩,黄传河.计算机取证技术[M],2006.
    [41]国家质量技术监督局.信息交换用汉字编码字符集基本集[M],1980.
    [42]国家质量技术监督局.信息交换用汉字编码字符集基本集的扩充[M],2001.
    [43]David J,Kruglinski.Programming Visual C++6.0技术内幕[M].北京:北京希望电子出版社,2001.
    [44]David Simon.Visual C++6.0编程宝典[M].北京:电子工业出版社,2005.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700