固件代码逆向分析关键技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
固件代码逆向分析是软件逆向工程研究领域的重要分支之一,通过对设备固化可执行代码的处理器类型识别、格式还原和结构分析,实现固件代码的逻辑和功能解读,有助于分析设备构成原理和采用技术,提高其软件系统的剖析能力。特别是随着网络技术的发展和网络密码的广泛应用,开展路由器和加解密系统等网络关键设备逆向分析,对国家安全和信息获取有重要作用。
     嵌入式电子设备的核心部件是处理器。由于嵌入式技术应用的广泛性和不确定性,处理器的选择具有多样性,要求固件代码逆向分析方法和工具必须能够适应多处理器多指令系统;另一方面,随着处理器类型的不断增多,逆向分析遇到的处理器种类不确定性日益增强,要求代码分析工具必须具有可扩展性(尤其是用户可扩展性)。
     然而所有的商用可执行代码格式还原工具和代码结构分析技术都不能很好地适用于多处理器多指令系统,在以可视化形式给出代码结构和支持人机交互方式进行结构分析等方面能力不足,不具备用户可扩展性,指令集识别技术的相关研究未见有公开的成果报道。
     针对上述问题,论文对固件代码逆向分析的理论和技术进行了探索和研究,尤其是对一些固件代码逆向分析实用性关键技术,给出了有效的解决方法,具体如下:
     1、通过对主流处理器及指令系统结构的分析,提出了一种基于多维可变描述表的处理器结构和指令系统表示模板,并采用数据库技术实现了处理器结构信息和指令系统的管理,解决了多处理器多指令系统适用性问题,可保证以此为基础实现的固件代码逆向分析系统具有用户可扩展性。
     2、通过对固件代码结构特点的研究,提出了一种基于程序静态流程遍历图和程序流程蕴涵图相结合的反汇编策略,设计了基于指令分组和散列匹配的反汇编引擎,在固件代码反汇编的速度和正确率方面有所提高。
     3、提出了子程序结构与子程序调用关系的多重链表和层次结构树表示方式,设计了代码结构与流程提取和调整算法。可视化的层次结构展示支持用户以人机交互的方式进行代码逻辑和功能分析,增强了代码分析过程的直观性。
     4、在挖掘固件代码蕴涵的固有特征基础上,建立了适用于指令集类型识别的代码特征模型,设计了代码特征抽取和基于多属性决策技术的指令集识别算法,可有效解决电子设备解剖工作中遇到的处理器类型不明问题。
     论文还给出了上述关键技术的实际验证环境,验证结果表明,对上述关键技术的解决方法是有效和实用的。
Firm-code reverse analysis is one of the important branches in the field of software reverse engineering. By means of processor type recognition, format restoration and structure analysis for the firm executable code, the logical and functional of firm-code can be unscrambled, redound to analyzing the composing principle and techniques of device and improving the ability to dissect. Especially as the evolution of network techniques and broad application of network cipher, reverse analysis for the key network devices, such as routers and cipher machines, is of importance to national security and intelligence acquirement.
     The core component of embedded electrical device is processor. Because of universality and uncertainty of embedding application, the selection for processor has multiformity, requesting that the method and tools of firm-code reverse analysis must have the ability to adapt in multiprocessors or multiple instruction-set systems; on the other hand, along with more and more processor types, the uncertainty of processor type boosts up increasingly, so the code analyzing tools must have addition (especially addition for users).
     However, all of the commercial executable format restoring tools and analyzing techniques cannot adapt in multiprocessors or multiple instruction-set systems very well, having poor ability that the code structure is given out visually and structure analysis is done in the way of man-machine conversation, lacking of addition for users. And the corresponding research on instruction-set recognition is also unseen in public achievement reports.
     In allusion to the above issues, the dissertation explores and studies the theories and techniques about firm-code reverse analysis, and especially for the key techniques applied in firm-code reverse analysis, it proposes effective methods. The detail is shown as follows:
     1. According to analysis for the mainstream processors and instruction system structure, the dissertation puts forward a kind of processor structure and instruction system denotation template based upon multidimensional alterable descriptive table, and makes use of database technique in order to realize the management for processor structure information and instruction system. This can solve the applicability of multiprocessors or multiple instruction-set systems, making sure that the reverse analysis system for firm-code has addition for users.
     2. According to studying on the characteristic of firm-code structure, the dissertation puts forward a disassembly strategy based upon the program static flow traversing graph and program flow implication graph, and designs a disassembly engine on the basis of instruction category and hash matching in order to increase the speed and exactness of disassembly.
     3. The dissertation advances multi-linked list, which denotes subroutine structure and subroutine calling relationship, and is displayed by the way of hierarchy structure tree. And according to this, the dissertation designs an algorithm to abstract and adjust the code structure and flows. The visual hierarchy structure display supports logical and functional analysis in the way of man-machine conversation, strengthen the ability to directly perceive through the sense when analyzing codes.
     4. On the basis of digging the inhere characteristics implied in the firm-code, the dissertation builds up the code characteristics model adapted on instruction type recognition and designs the corresponding algorithm, which abstracts the code characteristics based on multi-attribute decisional technique, effectively recognizing the unidentified processor type when dissecting on electrical devices.
     The dissertation also shows the real verifying environment. The result indicates that the methods for the above key techniques are effective and available.
引文
[1]Elliot J.Chikofsky,James H.Cross.Reverse Engineering and Design Recovery:A Taxonomy[J].1990,IEEE Software 7(1):13-17.
    [2]国家自然科学基金委员会.国家自然科学学科发展战略报告计算机科学技术部分[R].北京:科学出版社,1994.
    [3]Eldad Eilam.Reversing:Secrets of Reverse Engineering.Indianapolis[M]:Wiley Publishing,Inc.,2005.
    [4]H.A.Muller.Reverse Engineering:A Roadmap[EB/OL].http://www.cs.ucl.ac.uk/staff/finkelstein/fose/finalmuller.pdf.,2000.
    [5]赵东范.反汇编基本方法及Z280指令系统反汇编的实现[J].长春邮电学报,1994,12(2):35-38.
    [6]马东斌,曲本泉,韩亮.MCS-51单片机智能反汇编软件研制[J].武汉水利电力大学学报,1998,31(1):72-74.
    [7]汪栎.支持多种CPU和多种指令系统的通用汇编程序设计与实现:[D].上海:复旦大学,2001.
    [8]方海玉.通用汇编系统开发技术:[D].上海:复旦大学,2004.
    [9]DataRescue Inc.IDA pro disassembler[EB/OL].2004-09-27,http://www.datarescue.com/idabase.
    [10]欧阳清华.反汇编原理及其实现技术[M].武汉:武汉大学出版社,1992.
    [11]G.Wroblewski.General Method of Program Code Obfuscation.Proceedings of the International Conference on Software Engineering Research and Practice[C],Las Vegas,NV,2002.
    [12]W.C.Hsieh,D.Engler,G.Back.Reverse Engineering Instruction Encoding.USENIX Annual Technical Conference[C].Boston,Mass,2001.
    [13]C.Linn and S.Debray.Obfuscation of Executable Code to Improve Resistance to Static Disassembly.In 10th ACM Conference on Computer and Communication Security(CCS)[C],2003.
    [14]吴金波,蒋烈辉.反静态反汇编技术研究[J].计算机应用,2005,25(3):623-625.
    [15]吴金波,蒋烈辉,赵鹏.基于控制流的静态反汇编算法研究[J].计算机工程与应用,2005,41(30):89-90.
    [16]肖正文等.代码与数据分离的反汇编程序设计[J].计算机工程与应用,1996,5:33-35.
    [17]Benjamin Schwarz,Saumya Debray,Gregory Andrews.Disassembly of Executable Code Revisited.In Proc.IEEE 2002 Working Conference on ReverseEngineering(WCRE)[C],2002,45-54.
    [18]严代彪,王树宗.嵌入式软件机器码的智能反汇编方法研究[J].青岛大学学报,2003,17(2):41-46.
    [19]王勇.基于编译原理技术的反汇编实现:[D].吉林大学,2004.
    [20]李学汇.自动反汇编程序的一种解决方案[J].微型机与应用,1997,16(10):7-9.
    [21]李学汇,闵华清.MCS-8098反汇编程序的一种实现方法[J].微计算机应用,1995,16(4):45-47.
    [22]J.R.Larus and E.Schnarr.EEL:Machine-independent executable editing.SIGPLAN'95 Conference on Programming Language Design and Implementation(PLDI)[C],California,1995,546-551.
    [23]M.Prasad and T.Chiueh.A Binary Rewriting Defense Against Stack-based Buffer Overflow Attacks.USENIX Annual Technical Conference[C],Texas USA,2003,387-392.
    [24]Linda M.Wills.Using Attributed Flow Graph Parsing to Recognize Programs.Proc.of International Workshop on Graph Grammars and Their Application to Computer Science[C],Williamsburg Virginia,1994,285-291.
    [25]J.T.Giffin,S.Jha,and B.R Miller.Detecting Manipulated Remote Call Streams.11th USENIX Security Symposium,SanFrancisco California[C],2002,358-362.
    [26]J.T.Giffin,S.Jha,and B.P.Miller.Efficient Context-Sensitive Intrusion Detection.11th Network and Distributed System Security Symposium[C],San Diego,2004,212-218.
    [27]C.Cifuentes,M.Van Emmerik,and N.Ramsey.The Design of a Resourceable and Retargetable Binary Translator.Sixth Working Conference on Reverse Engineering[C],Atlanta,1999,378-383.
    [28]R.L.Sites,A.Chernoff,M.B.Kirk,M.P.Marks,and S.G.Robinson.Binary Translation[J].Digital Tech Journal,1992,137-152.
    [29]D.Kastner etc.Code Optimization by Integer Linear Programming.In Proceedings of the 8th International Conference on Compiler Construction[C],1999,132-136.
    [30][30]Matthew S.Hecht.Flow Analysis of Computer Programs[M].New York,Elsevier Science Inc.,1977.
    [31]J.Ferrante,K.J.Ottenstein,and J.D.Warren.The Program Dependence Graph and its Use in Optimization.ACM Transactions on Programming Languages and Systems(TOPLAS)[C],1987,319-349.
    [32]M.Weiser.Program Slicing.In Proceedings of the 5th International Conference on Software Engineering[C],IEEE Computer Society Press,San Diego,California,1981,439-449.
    [33]Horwitz S.B,etc.Interprocedural slicing using dependence graphs.ACM Transactions on Programming Languages and Systems(TOPLAS)[C],1990,12(1):26-60.
    [34]Karl.Otenstein and Linda M.Otenstcin.The program dependence graph in a software development environment[R].ACM SIGSOFT Software Engineering Notes,1984,9(3): 177-184.
    [35]S.Sendall.Semantics of machine instructions[J].Department of Computer Science and Electrical Engineering,1997,26(7):132-141.
    [36]C.Cifuentes etc.Specifying the semantics of machine instructions[J].IEEE Computer, 1991,36(4):86-92.
    [37]C.Cifuentes etc.UQBT:Adaptable Binary Translation at Low Cost[J].IEEE Computer,2000, 33(3):60-66.
    [38]C.Cifuentes etc.Decompilation of binary programs[J].Software-Practice and Experience, 1994,28(2):56-60.
    [39]C.Cifuentes etc.Assembly to high-level language translation[J].IEEE Computer, 1996, 25(4):73-80.
    [40]C.Cifuentes etc.Recovery of Jump Table Case Statements from Binary Code[J].IEEE Computer,1998,21(2):59-64.
    [41]C.Cifuentes etc.Intraprocedural slicing of binary executables.Proceedings of the Inter-national Conference on Software Maintenance [C], Monterey, California, 1996, 452-461.
    [42]C.Cifuentes etc.Interprocedural dataflow decompilation[J]. Journal of Programing Languages, 1993,15(9):36-41.
    [43]Laune C. Harris etc.Practical Analysis of Stripped Binary Code[R].ACM SIGPLAN Notices, 1999,34(5):259-269.
    [44]J.R. Larus.EEL:Machine-Independent Executable Editing.In Proceedings of the ACM SIGPLAN Conference on Programming Language Design and Implementation[C],New York, 1995,352-361.
    [45] A. Srivastava etc.A Practical System for Intermodule Code Optimization at Link-Time[J]. Journal of Programming Languages, 1992,1(1): 1—18.
    [46]Amitabh Srivastava etc.ATOM A System for Building Customized Program Analysis Tools. In Proceedings of the SIGPLAN '94 Conference on Programming Language Design and Implementation (PLDI) [C] ,Orlando,Florida, 1994,242-256.
    [47] D. Kastner etc.PROPAN:A Retargetable System for Postpass Optimizations and Analyses. Languages, Compilers, and Tools for Embedded Systems:ACM SIGPLAN Workshop LCTES 2000[C],Vancouver,Canada,June 2000,63-80.
    [48] B. Schwarz etc.PLTO:A Link-Time Optimizer for the Intel IA-32 Architecture.In Proc. 2001 Workshop on Binary Translation (WBT-2001) [C], Barcelona,Spain,2001,10(15).
    [49]B.Schwarz etc.Disassembly of Executable Code Revisited.In Proceedings of the Ninth Working Conference on Reverse Engineering[C].Arizona Univ.,Tucson,AZ,USA,2002,45-54.
    [50]Intel.Intel(R)64 and IA-32 Intel Architecture Software Developer's Manual[EB/OL].http://www.intel.com/products/processor/manuals/index.htm.
    [51]Zilog.Z80 Family CPU User Manual[EB/OL].http://www.zilog.com/docs/z80/um0080.pdf.
    [52]Zilog.Z8 Family of Microcontrollers User Manual[EB/OL].http://www.zilog.com/docs/z8.
    [53]Motorola.M68000 FAMILY Programmer's Reference Manual[EB/OL].http://www.freescale.com/files/archives/doc/ref_manual/M68000PRM.pdf.
    [54]Motorola.DSP56000 Digital Signal Processor User's Manual[EB/OL].Motorola Literature Distribution Center,USA.
    [55]IBM,Motorola.PowerPC 604e~(TM)RISC Microprocessor User's Manual[EB/OL].http://www-306.ibm.com/chips/techlib/techlib.nsf/techdocs/852569B20050FF7785256996006E34F3/$file/604eUM_book.pdf.
    [56]SPARC.The SPARC Architecture Manual(Version 8)[EB/OL].http://www.sparc.org/standards/v8.pdf.
    [57]SIEMENS.C167 family preliminary user manual(V3.1)[EB/OL].2000-03,http://www.keil.com/dd/docs/datashts/infineon/c167cr_um.pdf.
    [58]INTEL.8x930Ax,8x930Hx Universal Serial Bus Microcontroller User's Manual[EB/OL].1997-02,http://www-corot.obspm.fr/COROT-ETC/Files/8x930ax.PDF.
    [59]Motorola,INC.M·core Applications Binary Interface Standards Manual[EB/OL].1997.
    [60]S.Furber著.ARM SOC体系结构[M].田泽等译.北京:北京航空航天大学出版社,2005.
    [61]Atmel corp.at91m55800 user manual[EB/OL].http://www.chipdocs.com/datasheets/datasheet-pdf/Atmel-Corporation/AT91M55800.html.
    [62]李继灿,李华贵等.新编16-32位微型计算机原理及应用[M].北京:清华大学出版社,2004.
    [63]郭勇,文延华,尉红梅.主流64位RISC芯片结构分析[J].高性能计算技术,2003,1:5-8.
    [64]蔡亮.Intel32位和64位处理器的比较[J].高性能计算技术,2003,1:9-12.
    [65]吴新军,胡向东.安腾高性能微处理器体系结构[J].高性能计算技术,2003,1:13-16.
    [66]姜小成,唐大国,漆锋滨.IA—64指令系统分析[J].高性能计算技术,2003,1:33-36.
    [67]郭桂香.UltraSPARC与Itanium寄存器机制的比较[J].高性能计算技术,2003,1:57-60.
    [68]蒋烈辉,张媛媛.支持通用反汇编的处理器结构库设计与实现[J].计算机工程与设计,2006,27(3):500-503.
    [69]栾于霞.基于RISC结构MCU的设计:[D].西安:西安电子科技大学,2004.
    [70]周莉.RISC/DSP处理器的结构、微结构设计研究:[D].浙江大学,2004.
    [71](美)Barry B.Brey著.Intel微处理器全系列究:结构、编程与接口[M].金惠华,艾金明,尚利宏等译.北京:电子工业出版社,2003.
    [72]王明虎.16位精简指令集微处理器软核的设计研究:[D].合肥:合肥工业大学,2004.
    [73]孙涵芳.Intel16位单片机[M].北京:北京航空航天大学出版社,2002.
    [74]Cullen Linn,Saumya Debray.Obfuscation of Executable Code to Improve Resistance to Static Disassembly.CCS'03[C],Washington,2003.
    [75]Christopher Kruegel,William Robertson,Fredrik Valeur and G.Vigna.Static Disassembly of Obfuscated Binaries.Proceedings of the 13th USENIX Security Symposium[C].San Diego,CA,USA,2004,255-270.
    [76]蒋烈辉,陈亮,赵荣彩等.基于控制流和数据段分析的反汇编策略研究[J].计算机工程,2007,33(2):94-96.
    [77]JIANG Liehui,CHEN Liang,ZHAO rongcai et al.Research on a Disassembly Strategy Based upon the Global Program Flow Graph.ICCSE'2006[C],Xiamen,China,2006,200-204.
    [78]周博,蒋烈辉.汇编子程序绘制算法研究.计算机应用与软件[J],2007,24(1):160-161.
    [79]B.D.Sutter etc.On the Static Analysis of Indirect Control Transfers in Binaries.In Proceedings of the International Conference on Parallel and Distributed Processing Techniques and Applications[C],Vegas,Nevada,USA,2000,152-161.
    [80]李必信等.一种面向对象程序的分层切片方法[J].软件学报,2001,12(12):1810-1817.
    [81]Henrik Theiling.Extracting Safe and Precise Control Flow from Binaries.In Proceedings of the 7th International Conference on Real-Time Computing Systems and Applications(RTCSA'00)[C],Hong Kong,2000,23-32.
    [82]蒋烈辉,韩小琨等.汇编级代码程序结构分析算法研究[J].计算机工程,2007,5.
    [83]Cullen Linn.Obfuscation of executable code to improve resistance to static disassembly.In Proceedings of the 10th ACM Conference on Computer and Communications Security(CCS)[C],Washington D.C.,USA,2003,290-299.
    [84]蒋烈辉,周博.一种递归式汇编级代码模块分析算法设计[J].计算机工程,2007,7.
    [85]陈珽.决策分析[M].北京:科学出版社,1987.
    [86]岳超源.决策理论与方法[M].北京:科学出版社,2003.
    [87](美)Richard O.Duda Peter E.Hart David G.Stork著.模式分类[M].李宏东,姚天翔等译.北京:机械工业出版社,2003.
    [88](美)Marques de Sa著.模式识别——原理、方法与应用[M].吴逸飞译.北京:清华大学出版社,2002.
    [89](美)Clifford A.Shaffer著.数据结构与算法分析(C++版)第二版[M].张铭译等译.北京:电子工业出版社,2002.
    [90]吴曦,蒋烈辉.基于决策理论的指令集识别技术研究[J].计算机工程与设计,2005,26(5):1274-1276

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700