公开密钥算法RSA的分析及其IP核的实现与验证
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着电子商务的发展,出现了智能卡、电子钥匙USB KEY等,广泛应用于交通、身份认证等领域,极大的方便了人们的工作和生活。RSA是目前应用最广泛的公开密钥算法,在智能卡等小型移动设备中实现RSA密钥算法,具有十分重大的意义。
     本设计的设计目标定义为面向低端,兼顾小面积和高性能,设计内容是包括RSA IP核设计和RSA密钥生成在内的一整套RSA算法解决方案。为保证安全性,可支持密钥长度要达到2048比特。
     本文首先对RSA加解密算法进行了深入分析,确定使用RL方式的二进制扫描算法实现模幂,使用Montgomery CIOS算法实现模乘,详细分析算法参数的选择,并对模平方等算法进行优化。通过软件建模,明确了算法实现的层次,为硬件实现奠定了良好的基础。
     在IP核设计中,根据设计目标选择32位高基模乘器作为核心硬件结构。之后合理划分层次模块,根据算法优化控制逻辑。在模乘器数据通路中,使用两级流水线,4-2压缩器等技术缩短关键路径,并使用两倍于模乘器字长的T SRAM存储运算中间结果,进一步提高了硬件利用率。存储系统使用5个SRAM存储操作数、中间结果和最终结果,有效缩小IP面积;用反相时钟技术和高效的存储策略,将其他模块与存储系统交互的开销降到最低。最终使IP核实现小面积、高性能,在100Mhz时钟下,2048位模幂(操作数长度均为2048位)的速度约为3.7次/秒,1024位模幂速度约为33次/秒,性能优良。IP核支持大于32位,小于2048位的模幂和模乘运算,采用通用接口设计,利于SOC集成,且具备密钥保护功能,操作简便。
     为了达到商用标准,本设计对RSA IP核进行了严格的测试验证,包括功能测试、性能测试和压力测试。遵循经典ASIC设计流程,搭建符合C*Bus总线时序的仿真环境,做综合,静态时序分析,形式验证,以及版图设计。本设计在smic 0.18um工艺下最终版图面积小于1mm2,在100Mhz时钟下满足前端定义的时序约束。
     由于RSA密钥更换频率较低,为了节省硬件资源,采用嵌入式软件的方式实现RSA密钥生成算法。本设计使用的算法用于生成1024位和2048位RSA密钥,基于国产32位CPU核(C*CORE C340),使用硬件真随机数发生器HRNG和RSA协处理器进行硬件加速。通过对算法原理和步骤进行深入分析,提出生成素数的三个步骤:随机产生奇数候选数,预筛选,素性测试。通过研究和优化预筛选算法,大大提高了产生素数的效率。分别使用Euclid算法和扩展Euclid算法求解最大公约数和模逆。最后根据嵌入式系统的特点对整个算法进行了优化,测试结果与同类产品相比有一定优势。
With the development of electronic business, smart card, USB Key and other small mobile devices are widely used in areas like traffic and identity authentication, and are very helpful in people's work and life. Nowadays, RSA is the most widely-spread public-key cryptography. It is highly significant to implement RSA in small mobile devices like smart card.
     This design will be used in low-end market. The design goal is to keep balance between area and time, and to deliver a whole RSA cryptography implementation scheme including RSA IP design and RSA key generation. In order to ensure the security, the length of RSA key should be as long as 2048 bits.
     Firstly, RSA encryption and decryption algorithms are deeply analyzed in the paper. RL binary method is used for modular exponentiation, and Montgomery CIOS algorithm used for modular multiplication. I describe in detail the selection of parameters and the optimization of the algorithms. Through software modeling, the algorithm hierarchy is defined, which provides a basis for hardware implementation.
     During the design of IP core, according to the design goal, I select 32-bit high radix modular multiplier as the core hardware structure. After that, I divide the whole design into proper hierarchy and modules, and optimize the control logic based on algorithms. For data path in modular multiplier, I use two-level pipeline, 4-2 compressor, and some other methods to shorten the critical path, then adopt T SRAM in which the word length is twice the word length of the multiplier, in order to increase the hardware utilization ratio. For memory system, to decrease the IP area, I use 5 SRAMs to store operators, intermediate results and final results, and I use reverse-phase clock and efficient storage strategy to make sure that the cost of data transmission between memory and other modules is the lowest. Finally I achieve the goal of small area and high performance. At a clock frequency of 100 MHz, the speed of 2048-bit modular exponentiation (all operators are 2048-bit long) is about 3.7 times per second, and 1024-bit modular exponentiation is about 33 times per second. The IP core supports modular exponentiation and modular multiplication of 32-bit to 2048-bit operators, has a universal interface which can be easily integrated in SOC, provides key protection function and quite easy to operate.
     To come up to business standards, I have done strict tests and verification on RSA IP core, including function tests, performance tests and pressure tests. According to the general ASIC design flow, I establish the simulation environment based on C*Bus protocol. Then I do synthesis, static timing analysis, formal verification, and layout design. The IP layout area is less than 1 mm2 in smic 0.18um technology, and meets the frontend timing requirements at 100 MHz clock frequency.
     Since the frequency of RSA key changes is quite low, in order to save hardware resources, RSA key generation algorithm is implemented with software in embedded system. The algorithm in the design is used to generate 1024-bit or 2048-bit RSA key, accelerated by true random data generator HRNG and RSA coprocessor in the system, and implemented based on 32-bit processor C*CORE C340. Through detailed analysis of algorithm principle, I come up with three steps to generate a big prime number: random generation of odd candidates, trial division with small primes, and primality tests. Efficiency of prime generation is greatly increased by careful research and optimization of trial division. Greatest common divisor and modular inversion are resolved with Euclid algorithm and extended Euclid algorithm, respectively. At the final stage the whole design is optimized based on the features of embedded system, and the test result is better compared with similar products.
引文
[1] William Stallings.密码编码学与网络安全——原理与实践(第三版).刘玉珍?王丽娜?傅建明等译.北京?电子工业出版社?2005?79-87?172-203
    [2] KocCK.High-speed RSA implementation.RSALaboratories Technical Report, TR801, V1.0, 1996.
    [3]周玉洁?冯登国.公开密钥密码算法及其快速实现.北京?国防工业出版社?2002
    [4]陈韬?基于CIOS算法的RSA芯片设计与实现?[硕士学位论文]?中国人民解放军信息工程大学?2004
    [5] Cetin Kaya Koc. RSA Hardware Implementation.RSA Laboratories, V1.0, 1995
    [6] Peter L. Montgomery. Modular Multiplication without Trial Division.Mathematics of Computation?1985?44(170): 519-521
    [7] CetinKayaKoe ? BurtonS.Kaliski Jr ? Tolga Aear.Analyzing and Comparing Montgomery Multiplication Algorithms.IEEE Micro, 1996, 16(3): 26-33
    [8] Stephen R.Dusse, Burton S.Kaliski Jr.A Cryptographic Library for the Motorola DSP56000. Advances in Cryptology– EUROCRYPT’90, LNCS 473, 1991: 230-244
    [9]李明久. Montgomery算法分析与应用改进.计算机工程与应用, 2007, 43(1): 68-69, 78
    [10]李树国?周润德?冯建华等. RSA密码协处理器的实现.电子学报, 2001, 29(11): 1441-1444
    [11]邓玉良?毛志刚?叶以正. Montgomery算法及其在加密硬件中的实现.计算机研究与发展?1999, 36(4): 505-508
    [12] A.J.Menezes.Handbook of Applied Cryptography.CRC Press, 1997
    [13]曾繁泰等. EDA工程方法学.北京:清华大学出版社?2003: 24-101
    [14]曾繁泰等. EDA工程实践.北京:清华大学出版社?2004: 136-159
    [15] Colin D.Walter. Systolic modular multiplication.IEEE Transactions on Computers, March 1993, 42(3)
    [16]温暖.基于心动阵列结构的RSA公钥密码协处理器的设计与实现: [硕士学位论文]?中国人民解放军信息工程大学?2005
    [17]况敬波.RSA加密算法的IP核研究与实现: [硕士学位论文]?哈尔滨工程大学?2006
    [18] C*Bus使用手册.苏州国芯科技有限公司
    [19] EDA先锋工作室?吴继华?王诚.Verilog设计与验证?人民邮电出版社?2006?第六章
    [20]朱华.椭圆曲线密码ξECCˇ研究分析及其IP的实现与验证: [硕士学位论文]?上海交通大学?2007
    [21] Chip-Hong Chang, Jiangmin Gu, Mingyan Zhang. Ultra Low-Voltage Low-Power CMOS 4-2 and 5-2 Compressors for Fast Arithmetic Circuits.IEEE TRANSACTIONS ON CIRCUIT AND SYSTEMS, VOL.51, NO.10, OCTOBER 2004?1985-1988
    [22] David A.Hodges, Horace G.Jackson, Resve A. Saleh. Analysis and Design of Digital Integrated Circuitsξ第三版影印本ˇ.北京?清华大学出版社?2004
    [23] M.Johnson, B.Phung, T.Shackelford, etc.Modular Reduction of Large Integers Using Classical, Barrett, Montgomery Algorithms
    [24] Gael Hachez, Jean-Jacques Quisquater. Montgomery Exponentiation with no Final Subtractions: Improved Results. Universite Catholique de Louvain, UCL Crypto Group
    [25]夏宇闻.Verilog数字系统设计教程.北京:北京航空航天大学出版社?2003.7
    [26] Himanshu Bhatnagar.Advanced ASIC Chip Synthesis Using Synopsys Design Compiler and Prime Time: Kluwer Academic Publishers, 2002
    [27] Synopsys Astro1 Workshop Student Guide, SYNOPSYS, 2004.06
    [28] Dan Clein. CMOS集成电路版图——概念?方法与工具.北京?电子工业出版社?2006
    [29]庄宗桓. Single Pass Physical Design with Astro.CICeNEWS 52技术论坛?1994
    [30] Michael D. Ciletti. Verilog HDL高级数字设计.张雅琦?李锵译.北京?电子工业出版社?2005.1
    [31] IEEE Std 1363-2000, IEEE Standard Specifications for Public-Key Cryptography, 30 January 2000
    [32]崔竞松?涂航?彭蓉等.嵌入式系统中大素数的快速生成.计算机工程, 2003,2 9(5): 24-58
    [33]姚国祥?林良超. RSA密钥对高效生成算法.计算机工程,2007,33(20):148-152
    [34] Venkat Suryadevara.Efficient On-board RSA Key Generation with Smart Cards.School of Electrical Engineering and Computer Science, Oregon State University
    [35]桂文明. 2048位RSA协处理器IP核的研究与设计: [硕士学位论文]?兰州大学?2007
    [36] H Handschuh, P Pailier. Smart Card Crypto-Coprocessor for Public-Key Cryptography.Smart card research and applications. Berlin: Springer, 2000:372-379

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700