可编程密码处理器关键技术研究与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
密码算法是保证信息的机密性、完整性以及可用性等安全要求的基本手段。性能和实现安全等方面的原因使得密码算法需要采用硬件方法实现。专用集成电路(ASIC)和细粒度可重构结构是硬件实现密码算法的两种传统方法。ASIC方法效率高,却无法满足应用环境中灵活实现密码算法的需求。细粒度可重构结构灵活性强,但其通用性带来了较高的设计代价。
     由于密码算法具有相对固定的处理模式,相关研究工作者分别以空间可编程和时间可编程为基础,面向密码处理领域提出了多种密码专用可重构结构和密码处理器,在一定程度上平衡了性能与灵活性的折衷。然而,已有的密码专用可重构结构普遍存在算法映射困难的问题,使其应用受到了限制;而目前的密码处理器虽然借助编译工具可方便的开发密码算法,但受限于传统体系结构,能够增加的定制功能单元的复杂度及其数量均有限,数据通路效率偏低。
     本文从时间可编程性出发,将传统体系结构的软硬件界面下移,使得软件看到处理器内部的数据传输以及互连网络,可支持复杂却高效的数据通路,更容易匹配密码处理模式,最终实现高效可编程密码处理器。主要工作及研究成果如下:
     1.提出了传输触发体系结构(TTA)指导下的专用指令集处理器(ASIP)自动生成方法。TTA中,软件所见为功能单元(FU)之间的数据传输,故硬件设计可以支持寄存器文件分割以及定制更多更复杂的FU,同时解决了指令集生成、可重定向编译等问题。提出了配置流驱动计算体系结构(CSDCA),将软硬件界面进一步下移,由编译器完成处理器内的传输路由,以支持高效却复杂的互连网络,采用段式总线互连技术,较好的解决了随着FU数量增加,数据传输延迟成为主频瓶颈和总线功耗冗余严重等问题。提出了通过双模式计算提高代码密度的方法:程序中的关键循环在CSDCA模式下执行以提高性能,其余部分则工作在RISC模式下以降低代码冗余。这些工作建立了支持高效数据通路的ASIP设计流程。
     2.提出并实现了一种高性能模幂处理器。提出以基数长度为处理字长的高基数Montgomery算法(RBHRMMM),结合并行模幂算法,将大数模幂运算拆分为原子操作矩阵序列,按照列共享原则设计列共享超流水处理阵列(CSSA)。CSSA作为特殊功能单元,基于上述ASIP设计流程,得到完整模幂运算处理器SEA-II,其电路等效门数为923k。基于SEA-II的1024位RSA解密速度达到6,353Kbps。
     3.提出并实现了一种可扩展双域公钥密码整体算法处理器。提出双域统一RBHRMMM算法,并以此为基础设计出行共享流水单元(RSSA),将RSSA耦合到已有ASIP设计流程,并增加大数寄存器,得到公钥整体算法处理器SPKP。SPKP具有如下特点:①通过软件工具,可快速开发出整体公钥密码系统;②RSSA具有良好的可扩展性;③流水单元实现矢量乘操作,并支持GF(p)和GF(2~n)双域;④通过调整总线宽度和RSSA中流水单元数量,可满足不同性能/面积约束。
     4.提出并实现了一种高性能安全Hash处理器。提出新型Hash算法计算模块划分方法,即分为压缩模块和扩散模块,而且每个模块包括队列、混洗和累加等三个子模块。据此设计出可重构功能单元,耦合到已有ASIP设计流程中,得到安全Hash处理器PSHP。与细粒度可重构结构相比,其逻辑利用率高,配置速度和运算速度快,而且开发方便;与ASIC实现相比,可以在性能和面积开销较小的前提下,灵活的支持常用Hash算法。
     5.提出并实现了一种高性能分组密码算法处理器PSCP。提出分组密码处理器优化的两个原则:①增加置换单元和子密钥存储单元,将核心运算期间的访存次数减少为零;②对基本操作进行重新组合,均衡延迟分布。与ASIC实现相比,在CBC、OFB、CFB等分组相关的加密模式下,PSCP获得相似的性能,但更灵活。与密码专用可重构结构相比,PSCP开发方便,可以实现包括密钥扩散在内的完整算法,具有更好的安全性。
     以上研究工作首先建立了支持复杂数据通路的ASIP设计流程,然后针对具体种类的密码算法和实际应用环境需求,研究并实现了四种效率高、可用性强的可编程密码处理器。处理器采用的目标工艺均为0.18μm 1P6M CMOS工艺,其中模幂处理器已经实现应用。
Cryptographic algorithms (CAs) are widely used to ensure security requirements such as confidentiality, integrity and usability. For performance as well as for implementation security reasons it is often required to realize CAs in hardware. Application specific integrated circuits (ASIC) and fine-grain reconfigurable structures (FRS) are two traditional approaches. A well-known drawback of ASIC solution is low flexibility. FRSs have sufficient flexibility, but suffer from significant overhead due to their generic nature.
     CAs have relatively fixed granularity and similar processing mode. Researchers have proposed several cryptography-specified reconfigurable structures by spatial programmability and several cryptographic processors by temporal programmability, these works achieved good tradeoffs between performance and flexibility. However, current reconfigurable structures are limited from practical applications because of difficulties in mapping CAs to them. For cryptographic processors, although it is convenient to develop algorithms by using compiler, their data-paths are constrained by the traditional architectures and can’t accelerate CAs efficiently.
     Starting from temporal programmability, this paper shift the hardware/software interface downwards, and let the software specify data transports and every transport’s routing path. This addresses the problems in designing complex but efficient data paths for traditional architectures. According to different class of cryptographic algorithms and the application environments, several practical programmable cryptographic processors are proposed and implemented. The main work and results are:
     1. We propose an automatic generation method for application specific instruction-set processor (ASIP) directed by transport triggered architecture (TTA). In TTA, software specifies data transports among function units (FUs), so application specific hardware can support more sophisticated FUs, and the problems about instruction generation and retargetable compiling can be solved at the same time. Configuration stream driven computing architecture (CSDCA) is proposed, where routing is performed by the compiler to support efficient but complex interconnections. Combined with segmented buses, we solve the problem that with the increase of FU number, the interconnection network of TTA becomes a bottleneck for frequency and consumes much extra power for specific data transport. RISC|CSDCA dual mode computing is proposed to enhance code density. Computation-intensive loops, which occupy most of the computing time, are performed in CSDCA mode to get higher performance, and the others are processed in RISC mode to reduce code redundancy. The above works build an ASIP design flow supporting efficient but complex data path.
     2. We propose and implement a high-performance modular exponentiation (ME) processor. A radix-length based high radix Montgomery modular multiplication algorithm is proposed, with this algorithm a ME can be decomposed into a series of primitive operation (PO) matrixes. A column sharing super-pipelining array (CSSA) is designed to perform these PO matrixes. Combined with the above ASIP design flow, a complete ME processor SEA-II is implemented. A decryption rate of 6.35 Mbps can be achieved for 1024-bit RSA with SEA-II.
     3. We propose a dual-field scalable processor implementing whole public key cryptosystems. A dual-field unified RBHRMMM algorithm is proposed, based on this algorithm, a row sharing super-pipelining array (RSSA) is designed. By embedding RSSA to the above ASIP design flow, a scalable public key processor SPKP is implemented. SPKP has such characters: (I) ECC whole algorithms can be developed conveniently through the TTA tool chain; (II) RSSA is scalable; (III) pipeline elements perform vector production and support Galois field GF(p) and GF(2n); (IV) different performance/area constraint can be achieved by adjusting the bus width and the number of RSSA’s pipeline elements.
     4. We propose a high-performance cryptographic hash processor. We propose a novel method to split hash algorithms, i.e. the kernel of a hash algorithm can be splitting into compress modules and an expansion module, and every module has the same structure and includes a query, a fusion sub-module and an accumulator. Custom reconfigurable FUs are designed base on this method, and by integrating them into the ASIP design flow, a cryptographic hash processor PSHP is implemented. Compared to fine-grain reconfigurable architecture, PSHP is faster and more area-efficient; compared to ASIC, it can support widely-used hash algorithms with a little overheads.
     5. We propose a high-performance block cipher processor PSCP. We propose two optimization principles: (I) the number of memory access in kernels can be decreased to zero by coupling a substantial unit and a sub-key storage unit; (II) reorganizing the basic operations to balance delay distribution. Compared with ASIC solutions, PSCP can achieve similar performance in CBC, CFB, or OFB mode, and PSCP has more flexibility. Compared to custom reconfigurable structures, PSCP has a more convenient developing method, and support the complete algorithm including key expansion, so PSCP is much safer and more usable.
     These processors all use 0.18μm 1P6M CMOS technology, and the ME processor has been sold in the market.
引文
[1] Diffie W., Hellman M. E.. New directions in cryptography. IEEE Transactions on Information Theory, 22(6): 644~654, 1976
    [2] Rivest R. L., Shamir A., Adleman L.. A method for obtaining digital signature and public-key cryptosystems. Communications of ACM, 21(2): 120~126, 1978
    [3] NIST. Federal Information Processing Standard (FIPS) PUB 186 Digital Signature Standard. http://www.itl.nist.gov/fipspubs/fip186.htm, May, 1994
    [4] Miller V. . Use of elliptic curves in cryptography. Proceedings of CRYPTO’85, Santa Barbara, California, USA, 1985. pp. 417-426, Springer-Verlog New York, USA
    [5] Koblitz N.. Elliptic curve cryptosystems. Mathematics of Computation, 48(77): 203-209, 1987
    [6] (美)施奈尔,吴世忠等译. 应用密码学:协议、算法与 C 源程序. 北京:机械工业出版社,2001.
    [7] Lai X. J., Massey J. L. . A proposal for a new block encryption standard. Proceedings of the workshop on the theory and application of cryptographic techniques on Advances in cryptology. Brighton, UK, 1991. pp.389-404, Springer-Verlog New York, USA
    [8] 卢开澄. 计算机密码学. 清华大学出版社,1998.
    [9] Menezes A., Oorschot P., Vanstone S.. Handbook of Applied Cryptograph. CRC Press, 1996.
    [10] 卿斯汉. 密码学与计算机网络安全. 清华大学出版社,2000.
    [11] Specification of E2- a 128-bit Block Cipher. Nippon Telegraph and Telephone Corporation. June, 1998. http://info.isl.ntt.co.jp/e2/E2spec.pdf.
    [12] Adams C.. The CAST-256 Encryption Algorithm. NIST AES Proposal, June 1998. http://www.entrust.com/resources/pdf/cast-256.pdf.
    [13] Brown L., Pieprzyk J.. Introducting the new LOKI97 Block Cipher. NIST AES Proposal, June 1998. http://www.adfa.oz.au/~lpb/research/LOKI97/
    [14] Massey J. L., Khachatrian G. H., Kuregian M. K.. Nomination of SAFER+ as candidate algorithm for the advance encryption standard. Proceedings of 1st AES candidate conference, Report prepared by Edward Roback and Morris Dworkin, 1998
    [15] Lucks S.. On the Security of the 128-Bit Block Cipher DEAL. http://th.informatik.uni-mannheim.de/m/lucks/papers/SEC-DEAL.ps.gz.
    [16] Leong M. P., ed. A Bit-Serial Implementation of the International Data Encryption Algorithm IDEA. Proceedings of FCCM, Napa Valley, CA, 2000
    [17] Iwata T., Kurosawa K.. On the pseudorandomness of the AES finalists RC6 and serpent. LNCS(1978): 231-240, 2001
    [18] NIST. Federal Information Processing Standard (FIPS) PUB 197 Advanced Encryption Standard. http://www.nist.gov/aes/, November, 2001
    [19] Rivest R. L.. The MD5 Message-Digest Algorithm. RFC1321, MIT Laboratory for Computer Science and RSA Data Security Inc, April 1992.
    [20] Federal Information Processing Standards. Secure Hash Standard. FIPS PUB 180-1, April 1995.
    [21] Federal Information Processing Standards. Secure Hash Standard. FIPS PUB 180-2, August, 2002.
    [22] Dobbertin H., Bosselaers A., Preneel B.. RIPEMD-160: A strengthened version of RIPEMD. In fast software encryption, pp.71~82, 1996.
    [23] Verbauwhede I., Schaumont P., Kuo H.. Design and performance testing of a 2.29 Gbps Rijndael processor. IEEE Journal of Solid-State Circuits, 38(3): 569-572, 2003
    [24] Lee R. B.. Subword parallelism with MAX-2. IEEE Micro, 16(4): 51–59, 1996
    [25] Middha B., ed. A Trimaran based framework for exploring the design space of VLIW ASIPs with coarse grain functional units. Proceedings of the 15th International Symposium on System Synthesis, Kyoto, 2002. pp.2-7, ACM press
    [26] Jain D., Kumar A., Pozzi L., Ienne P.. Automatically customizing VLIW architectures with coarse grained application-specific functional units. Proceedings of the 8th International Workshop on Software and Compilers for Embedded Systems, Amsterdam, Netherlands, 2004. pp.17-32, Springer-Verlog, Berlin
    [27] Ferrante A., Piscopo G., Scaldaferri S.. Application driven optimization of VLIW architectures: A hardware-software approach. Proceedings of Real Time and Embedded Technology and Applications Symposium, San Francisco, CA, 2005. pp.128-137. IEEE Computer Society
    [28] Bajot Y., Mehrez H.. Customizable DSP architecture for ASIP core design. Proceedings of IEEE International Symposium on Circuit and Systems, Sydney, NSW, Australia 2001. Vol.4: 302-305, IEEE Computer Society
    [29] Ye Z. A., Moshovos A., Hauck S., Banerijee P.. CHIMAERA: a high-performance architecture with a tightly-coupled reconfigurable functional unit. Proceedings of the 27th Annual International Symposium on Computer Architecture, Vancouver, BC, Canada, 2000. pp.225-235, ACM press
    [30] Corporaal H.. Microprocessor Architecture from VLIW to TTA. West Sussex, England: John Wiley & Sons Ltd, 1998
    [31] Faraboschi P., ed. Lx: a technology platform for customizable VLIW embedded processing. Proceedings of the 27th Annual International Symposium on Computer Architecture, Vancouver, BC, Canada, 2000. pp.203-213, ACM press
    [32] Lapinskii V. S., Jacome M. F., Veciana G. A.. Application-specific clustered VLIWdatapaths: early exploration on a parameterized design space. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 21(8): 889-903, 2002
    [33] Ebeling C., Cronquist D. C., Franklin P.. RaPiD – Reconfigurable Pipelined Datapath. LNCS(1142): 126-135, 1996.
    [34] Kress R., Hartenstein R.. A datapath synthesis system for the reconfigurable datapath architecture. Proceedings of ASP-DAC, Chiba, Japan, 1995. No.77, ACM press
    [35] Abnous A., ed. Evaluation of a low-power reconfigurable DSP architecture. Proceedings of Reconfigurable Architectures Workshop, pp. 55-60,1998
    [36] Zhang H., ed. A 1V heterogeneous reconfigurable processor IC for baseband wireless applications. Proceedings of International Solid-State Circuits Conference, San Francisco, USA, 2000. pp. 68-69
    [37] Hauser J., Wawrzynek J.. Garp: a MIPS processor with a reconfigurable coprocessor. Proceedings of IEEE FCCM’97, Napa, 1997. pp.12-20, IEEE press
    [38] Singh H., ed. MorphoSys: an integrated reconfigurable system for data-parallel and computation-intensive applications. IEEE Transactions on Computers, 49(5): 465-481, 2000
    [39] Miyamori T., Olukotun K.. REMARC: reconfigurable multimedia array coprocessor. Proceedings of ACM/SIGDA FPGA’98, Monterey, 1998. pp.261-269, ACM press
    [40] Tang X.. A compiler directed approach to hiding configuration loading latency in Chameleon reconfigurable chips. Proceedings of 10th International Conference on Field Programmable Logic and Applications, Villach, Austria, 2000.
    [41] 姜晶菲. 可重构密码处理结构的研究与设计. 长沙: 国防科技大学博士学位论文, 2004
    [42] Mirsy E., Dehon A.. MATRIX: a reconfigurable computing architecture with configurable instruction distribution and deployable resources. Proceedings of IEEE FCCM’96, Napa, 1996. pp. 157-166, IEEE press
    [43] Janssen J.. Compiler strategies for transport triggered architecture. Ph. D. thesis, Delft, 2001
    [44] Leupers R.. Instruction scheduling for clustered VLIW DSPs. Proceedings of IEEE PACT, Philadelphia, Pennsylvania, USA, 2000. pp.291-300, ACM press
    [45] Lapinskii V. S., Jacome M. F., Veciana G. A.. Cluster assignment for high-performance embedded VLIW processors. ACM Transactions on Design Automation of Electronic Systems, 7(3): 430-454, 2002
    [46] Jang S., Carr S., Swany P., Kuras D.. A code generation framework for VLIW architecture with partitioned register banks. Proceedings of International Conference on Massively Parallel Computing Systems, 1998. pp.61-69
    [47] Cronquist D. C., ed. Specifying and Compiling Applications for RaPiD. Proceedings of FCCM’98, Napa, 1998. pp.126-134 , IEEE press
    [48] Budiu M., Goldstein S. C.. Fast compilation for pipelined reconfigurable fabrics. Proceedings of FPGA’99, Monterey,1999. pp.135~143, IEEE press
    [49] Compton K.. Architecture generation of customized reconfigurable hardware. Ph. D thesis, northwestern university, USA, 2003.
    [50] Compton K., Sharma A., Phillips S., Hauck S.. Flexible routing architecture generation for domain-specific reconfigurable subsystems. Proceedings of International Conference on Field-Programmable Logic and Applications, 2002. pp. 59-68
    [51] Nageldinger U.. KressArray Xplorer: a new CAD environment to optimize reconfigurable datapath array architectures. Proceedings of ASP-DAC’00, Yokohama, Japan, 2000. pp.163-168, ACM press
    [52] Elbirt A. J., Paar C.. Instruction-level distributed processing for symmetric-key cryptography. Proceedings of Parallel and Distributed Processing Symposium, 2003
    [53] Wu L., Weaver C., Austin T.. Cryptomaniac: a fast flexible architecture for secure communication. Proceedings of ISCA’01, 2001. pp.110-119, ACM press
    [54] Buchty R.. CRYPTONITE: a programmable crypto processor architecture for high-bandwidth applications. Ph.D Thesis. Institut fur Informatik der Technischen Universitat Munchen. 2002.
    [55] Eberle H., ed. A public-key cryptographic processor for RSA and ECC. Proceedings of the 15th IEEE International Conference on Application-Specific Systems, Architectures and Processors, pp.98-110, 2004
    [56] Keutzer K., Malik S., Newton A. R.. From ASIC to ASIP: The Next Design Discontinuity. Proceedings of IEEE International Conference on Computer Design. Piscataway, 2002. pp.84–90, IEEE Computer Society Press
    [57] Corporaal H. TTAs: missing the ILP complexity wall. Journal of Systems Architecture, 45(12): 949-973, 1999
    [58] Jain M. K., Balakrishnan M., Kumar A.. ASIP Design Methodologies: Survey and Issues. Proceedings of the Fourteenth International Conference on VLSI Design. Bangalore, 2001. pp.76-81, IEEE Computer Society Press
    [59] Gloria A. D., Faraboschi P.. An evaluation system for application specific architectures. Proceedings of the 23rd Annual Workshop and Symposium on Microprogramming and Microarchitecture. Alamitos, 1990. pp.80-89, IEEE Computer Society Press
    [60] Ghazal N., Newton R., Jan R.. Retargetable estimation scheme for DSP architectnre selection. Proceedings of ASP-DAC, 2000. pp. 485-489, ACM press
    [61] 吴百锋, 彭澄廉, 朱琦, 孙晓光. 嵌入式系统在资源争用条件下的软硬件划分.计算机辅助设计与图形学学报, vol.17(4): 671-676, 2005
    [62] 陈艾, 周学海, 李曦, 王志刚, 王蜂. 专用指令集处理器(ASIP)行为级设计方法研究. 计算机工程与应用, 40(29): 44-46, 2004
    [63] Makela R., Takala J., Vainio O.. Analysis of Different bus architectures fortransport triggered architectures. Proceedings of 21st Norchip, Riga, Latvia, 2003. pp. 56-59, IEEE press
    [64] Karkowski I., Corporaal H.. Design space exploration algorithm for heterogeneous multi-processor embedded system design. Proceedings of DAC, San Francisco USA, 1998
    [65] Rau B. R., Fisher J. A.. Instruction-level parallel processing: history, overview and perspective, The Journal of Supercomputing, vol. 7(5): 9–50, 1993
    [66] Liu D., Svensson C.. Power consumption estimation in CMOS VLSI chips. IEEE Journals of Solid-State Circuits, vol. 29(6): 663-670, 1994.
    [67] Bjerregaard T., Mahadeven S.. A survey of research and practices of network on chip. ACM Computing Surveys, 38(1):1-61, 2006
    [68] 王祚栋, 魏少军. SOC 时代低功耗设计的研究与进展. 微电子学, VOL.35(2): 174-179, 2005.
    [69] Jone W. B., ed. Design theory and implementation for low-power segmented bus systems, ACM Transcations on Design Automation of Electronic Systems, Vol. 8(1): 38~54, 2003.
    [70] Katsinis C.. A Segmented-Shared-Bus Multicomputer Architecture. Proceedings of Ninth International Conference on Parallel and Distributed Computing and Systems, 1997.
    [71] Yeh C. H., Parhami B.. Design of High-performance massively parallel architectures under pin limitations and non-uniform propagation delay. Proceedings of the 2nd International Symposium on Parallel Algorithms / Architecture Synthesis, pp. 58-65, 1997.
    [72] Guo J., Papanikolaou A., Marchal P., Catthoor F.. Energy/area/delay trade-offs in the physical design of on-chip segmented bus architecture. Proceedings of the International Workshop on System-level Interconnect Prediction, pp.75-81, 2006
    [73] Heikkinen J., Takala J., Sertamo J.. Code compression on transport triggered architectures. Proceedings of Workshop on System-on-Chip for Real-Time Applications, Banff, Canada, 2002. pp. 186-195, IEEE press
    [74] Heikkinen J., ed. Immediate optimization for compressed transport triggered architectureinstructions. Proceedings of Symposium on System-on-Chip, Tampere, Finland, 2003. pp. 65-68, IEEE press
    [75] Heikkinen J., Kuukkanen P., Takala J.. Bitwise and dictionary modeling for code compression on transport triggered architectures. WSEAS Transactions on Circuitsand Systems, vol.3(9): 1750-1755, 2004
    [76] Heikkinen J., ed, Evaluating template-based instruction compressionon transport triggered architectures. Proceedings of Workshop on System-on-Chip for Real-Time Applications. Calgary, 2003. pp. 192-195, IEEE press
    [77] Heikkinen H., Cilio A., Takala J., Corporaal H.. Dictionary-based program compression on transport triggered architectures. Proceedings of IEEE International Symposium on Circuits and Systems. Kobe, 2005. pp. 1122-1125, IEEE press
    [78] Zivojnovic V., Martinez J., Schlager C., Meyr H.. DSPStone: A DSP-Oriented benchmarking methodology. Proceedings of International Conference on Signal Processing Application and Technology. Dallas,1994
    [79] Shand M., Vuillemin J. E.. Fast implementations of RSA cryptography. Proceedings 11th IEEE Symposium on Computer Arithmetic, Windsor, Canada, 1993
    [80] Blum T., Paar C.. High-radix Montgomery modular exponentiation on reconfigurable hardware. IEEE transactions on Computers, 50(7):759-764, 2001
    [81] Tang S. H., Tsui K. S., Leong P. H. W.. Modular exponentiation using parallel multipliers. Proceedings of IEEE International Conference on Field Programmable Technology, Tokyo, Japan, 2003
    [82] Orup H., Kornerup P.. A high-radix hardware algorithm for calculating the exponential me Modulo n. Proceedings of IEEE Symposium on Computer Arithmetic, Windsor, Canada, 1991
    [83] Hong J. H., Tsai P. Y., Wu C. W.. Interleaving schemes for a systolic RSA public-key cryptosystem based on an improved Montgomery algorithm. Proceedings 11th VLSI Design/CAD Symposium, Taiwan, China, 2000
    [84] Wu C. H., Hong J. H., Wu C. W.. RSA cryptosystem design based on the Chinese remainder theorem. Proceedings of ASP-DAC, Yokohama, Japan, 2001
    [85] Mcivor C., Mcloone M.. Fast Montgomery modular multiplication and RSA cryptographic processor architecture. Proceedings of 37th Asilomar Conference on Signals, Systems and Computers, Monterey, USA, 2003
    [86] Liu Q. , Ma F. , Tong D. , Cheng X. . A regular parallel RSA processor. Proceedings of 47th Midwest Symposium on Circuits and Systems, Hiroshima, Japan, 2004
    [87] Knuth D. E. . The art of computer programming, Vol. 2: seminumerical algorithms, Addison-Wesley, 1981
    [88] Montgomery P. L. . Modular multiplication without trial division. Mathematics of Computation, 44(170): 519-521, 1985
    [89] Eldridge S. E. , Walter C. D. . Hardware implementation of Montgomery's modular multiplication algorithm. IEEE transactions on Computers, 42(6): 693-699, 1993
    [90] Orup H. . Simplifying quotient determination in high-radix modular multiplication.and Systems, vol.3(9): 1750-1755, 2004
    [76] Heikkinen J., ed, Evaluating template-based instruction compressionon transport triggered architectures. Proceedings of Workshop on System-on-Chip for Real-Time Applications. Calgary, 2003. pp. 192-195, IEEE press
    [77] Heikkinen H., Cilio A., Takala J., Corporaal H.. Dictionary-based program compression on transport triggered architectures. Proceedings of IEEE International Symposium on Circuits and Systems. Kobe, 2005. pp. 1122-1125, IEEE press
    [78] Zivojnovic V., Martinez J., Schlager C., Meyr H.. DSPStone: A DSP-Oriented benchmarking methodology. Proceedings of International Conference on Signal Processing Application and Technology. Dallas,1994
    [79] Shand M., Vuillemin J. E.. Fast implementations of RSA cryptography. Proceedings 11th IEEE Symposium on Computer Arithmetic, Windsor, Canada, 1993
    [80] Blum T., Paar C.. High-radix Montgomery modular exponentiation on reconfigurable hardware. IEEE transactions on Computers, 50(7):759-764, 2001
    [81] Tang S. H., Tsui K. S., Leong P. H. W.. Modular exponentiation using parallel multipliers. Proceedings of IEEE International Conference on Field Programmable Technology, Tokyo, Japan, 2003
    [82] Orup H., Kornerup P.. A high-radix hardware algorithm for calculating the exponential me Modulo n. Proceedings of IEEE Symposium on Computer Arithmetic, Windsor, Canada, 1991
    [83] Hong J. H., Tsai P. Y., Wu C. W.. Interleaving schemes for a systolic RSA public-key cryptosystem based on an improved Montgomery algorithm. Proceedings 11th VLSI Design/CAD Symposium, Taiwan, China, 2000
    [84] Wu C. H., Hong J. H., Wu C. W.. RSA cryptosystem design based on the Chinese remainder theorem. Proceedings of ASP-DAC, Yokohama, Japan, 2001
    [85] Mcivor C., Mcloone M.. Fast Montgomery modular multiplication and RSA cryptographic processor architecture. Proceedings of 37th Asilomar Conference on Signals, Systems and Computers, Monterey, USA, 2003
    [86] Liu Q. , Ma F. , Tong D. , Cheng X. . A regular parallel RSA processor. Proceedings of 47th Midwest Symposium on Circuits and Systems, Hiroshima, Japan, 2004
    [87] Knuth D. E. . The art of computer programming, Vol. 2: seminumerical algorithms, Addison-Wesley, 1981
    [88] Montgomery P. L. . Modular multiplication without trial division. Mathematics of Computation, 44(170): 519-521, 1985
    [89] Eldridge S. E. , Walter C. D. . Hardware implementation of Montgomery's modular multiplication algorithm. IEEE transactions on Computers, 42(6): 693-699, 1993
    [90] Orup H. . Simplifying quotient determination in high-radix modular multiplication.
    [106] 赵学秘, 陆洪毅, 戴葵, 王志英, 童元满. SEA:一种高性能大数模幂协处理器. 计算机研究与发展, 42(6): 924~929, 2005
    [107] Deepakumara J., Heys H. M., and Venkatesan R. FPGA implementation of MD5 hash algorithm. Proceedings of Canadian Conference on Electrical and Computer Engineering, Toronto, Canada, 2001. Vol. 2, pp. 919–924
    [108] Diez J. M., Bojanic S., Stanimirovicc L., Carreras C., Nieto-Taladriz O. Hash algorithms for cryptographic protocols: FPGA implementations. Proceedings of 10th Telecommunications Forum, Belgrade, Yugoslavia, 2002
    [109] Dominikus S. A hardware implementation of MD4-family hash algorithms. Proceedings of 9th IEEE Inter. Conf. Electronics, Circuits and Systems, Dubrovnik, Croatia, September 2002. Vol. 3:1143–1146
    [110] Jarvinen K. , Tommiska M. , Skytta J. . Hardware implementation analysis of the MD5 hash algorithm. Proc. 38th Hawai’i Int. Conf. System Sciences, Big Island, HI, USA, January 2005, pp. 298-234
    [111] Jarvinen K., Tommiska M., Skytta, J.. A compact MD5 and SHA-1 co-implementation utilizing algorithm similarities. Proc. Int. Conf. Engineering of Reconfigurable Systems and Algorithms, Las Vegas, NV, USA, June 2005, pp. 48–54
    [112] Kang Y. K. , Kim D. W. , Kwon T. W. , Choi J. R. . An efficient implementation of hash function processor for IPSEC. Proc. IEEE Asia-Pacific Conf. on ASIC, Taipei, Taiwan, August 2002, pp. 93–96
    [113] Lien R. , Grembowski T. , Gaj K. . A 1 Gbit/s partially unrolled architecture of hash functions SHA-1 and SHA-512. Proc. the Cryptographers’ Track at the RSA Conf. 2004, San Francisco, CA, USA, 2004, pp. 324–338
    [114] Ng C.-W. , Ng T.-S. , Yip K.-W. A uninified architecture of MD5 and RIPEMD-160 hash algorithms. Proc. 2004 IEEE Int. Symp. on Circuits and Systems, ISCAS’04, Vancouver, British Columbia, Canada, May 2004, Vol. 2, pp. 889–892
    [115] Sklavos N., Dimitroulakos G., Koufopavlou O. An ultra high speed architecture for VLSI implementation of hash functions. Proc. 2003 10th IEEE Int. Conf. Electronics, Circuits and Systems, ICECS 2003, Sharjah, United Arab Emirates, December 2003, Vol. 3, pp. 990–993
    [116] Ting K. K., Yuen S. C. L., Lee K.H., Leong, P. H. W.. An FPGA based SHA-256 processor. Proc. 12th Int. Conf. Field-Programmable Logic and its Applications, FPL 2002, Montpellier, France, September 2002, pp. 577–585
    [117] Wang M.-Y., Su C.-P., Huang C.-T., Wu C.-W.. An HMAC processor with integrated SHA-1 and MD5 algorithms. Proc. Asia and South Pacific Design Automation Conf. 2004, Yokohama, Japan, January 2004, pp. 456–458
    [118] Zibin D., Ning Z.. FPGA implementation of SHA-1 algorithm. Proc. 2003 5th Int. Conf. ASIC, ASICON 2003, Beijing, China, October 2003, Vol. 2, pp. 1321–1324
    [119] 顾正付,周玉洁。基于 SHA-1 算法的 IP 核设计,信息安全与通信保密,No.3 P.115-117,2005
    [120] 黄谆,白国强,陈弘毅。快速实现 SHA-1 算法的硬件结构,清华大学学报(自然科学版),Vol.45 No.1 P.123-125,2005
    [121] Hodjat A. , Verbauwhede I.. Area-Throughput trade-offs for fully pipelined 30 to 70Gbits/s AES processors. IEEE Transactions on Computer, 55(4): 366-372, 2006
    [122] 高娜娜, 王沁, 李占才. 基于 AES 和 DES 算法的可重构 S 盒硬件实现. 小型微型计算机系统, 27(3):446-449,2006
    [123] Satoh A., Ooba N., Takano K., Avigon E. D.. High-speed MARS hardware. Proceedings of 3rd AES conference, New York, pp. 305-316
    [124] Cheung O. Y. H., Tsoi K. H., Leong P. H. W., leong M. P.. Tradeoffs in parallel and serial implementations of the international data encryption algorithm IDEA. Proceedings of the Third International Workshop on Cryptographic Hardware and Embedded Systems, pp. 333-347
    [125] Lai Y. K., Shu Y-C.. VLSI architecture design and implementation for BLOWFISH block cipher with secure modes of operation. Proceedings of the 2001 IEEE International Symposium on Circuits and Systems, pp.57-60
    [126] Zambreno J., Nguyen D., Choudhary A. Exploring area/delay tradeoffs in an AES FPGA implementation. Proc. 14thInt. Conf. Field-Programmable Logic and its Applications, Antwerp, Belgium, August–September 2004,pp. 575–585
    [127] Jarvinen K., Tommiska M., Skytta J.. Comparative survey of high-performance cryptographic algorithm implementations on FPGAs. IEE Proceedings on Information Security, Vol. 152(1): 3-12, 2005
    [128] Fiskiran A. M., Lee R. B.. On-chip lookup tables for fast symmetric-key encryption. Proceedings of IEEE Application Specific Architectures and Processors, pp.356-363, 2005
    [129] Burke J., McDonald J., Austin T.. Architectural support for fast symmetric-key cryptography. Proceedings of the ninth international conference on Architectural support for programming languages and operating systems, pp. 178-189, 2000
    [130] R. Reed Taylor, Seth Copen Godstein. A high performance flexible architecture for cryptography. Proceedings of the CHES’99, 1999
    [131] Jiang J. F. , Ni X. Q. , Zhang M. X.. Reconfigurable cipher processing framework and implementation. Proc. of the 5th APPT, 2003, pp.509-519
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.