详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
     1.实现XDNP异构多核网络安全处理器,其内的各个IP核均自主开发。该网络安全处理器包括1个XD-MP Core,6个包处理引擎PE,1个安全协处理器单元,以及存储控制(SRAM和SDRAM)单元和网络数据交换总线单元。将片上总线分成两类,分别为控制平面总线和数据平面总线。提出一种分离式并行交换结构的片上总线,将数据平面总线分离为命令总线和数据总线的多核共享形式使得经过命令总线仲裁后的总线请求在得到数据响应时可以通过不同的数据总线并发的进行响应,大大提高了片上总线的传输速率。
     4. ECC密码系统在加解密的计算中需要使用到大量的模乘运算和模平方运算。在素域上不但优化了传统的蒙哥马利模乘运算,提出了2位超前蒙哥马利模乘运算器,还利用平方运算所固有的特性对部分积进行重构,使部分积的数量减少一半,并以此为基础提出折半模平算法,设计专门针对模平方运算的电路,使模平方运算的时间仅为模乘运算的一半。在二进制域上,提出字串行模乘算法,将被乘数左移两个字,使得硬件实现能采用流水线技术,同时简化了关键变量的计算方法,缩短了电路计算的延时,提高了性能,能快速有效的计算出两个操作数的模乘结果。提出双字串行模平方算法,利用二进制域平方运算所固有的特性直接得到平方运算结果,再利用蒙特马利方法对结果约减,可每次处理两个字长,其计算时间也为模乘运算的一半。以这些算术运算电路为基础实现了ECC双域密码协处理器,其具有较高的性能。
High-speed security network processor plays a more and more important role innetwork development. Supported by the project of network processor, this dissertationfocuces on the research of hardware, including ALU, security cryptography circuit andso on, in security network processor and presents five main contributions as follows:
     1. XDNP heterogeneous multi-core security network processor was implemented,among which all IP cores were independently developed. XDNP consists of oneXD-MP Core, six Packet Engines (PEs), one security cryptoprocessor, one SRAMcontroller, one SDRAM controller and Media and Switch Fabric Interface(MSF).There are two kinds of buses on chip. One is control plane bus; the other is dataplane bus. A new chip bus architecture based on split transaction was proposed. Byadopting the new architecture, data plane bus is divided into two parts. One part is acommand bus shared by all cores, the other part are several data buses correspondingto each core. This architecture allows different data bus having data transferred at thesame time, which brings with high throughput and low bus latency.
     2. The logic expressions were optimized for block generate and block propagatesignals in fast adder which can be implemented using differential cascode voltageswitch with pass-gate (DCVSPG) logic. This method solves the problem of logicconflict in static Manchester carry bypass circuit, and eliminates the cost of delay andpower in charge stage of dynamic Manchester carry bypass circuit. It has a higherspeed and lower power than CMOS stander cell carry generate circuit. The problemthat the size of every NMOS transistor in DCVSPG logic would affect the performanceof the circuit is discussed. Then a simple delay model of DCVSPG logic was built toevaluate the delay of the circuit. The delay model of DCVSPG logic can be used tooptimize the size of NMOS transistors in adder circuit implemented by DCVSPG logic.A32-bit adder was implemented by DCVSPG logic, of which performance is higherthan that of adder implemented by CMOS stander cell.
     3. Modular multiplication and exponentiation severely restrict the RSAperformance. The thesis presents a modified Montgomery modular multiplicationalgorithm based on the two-level carry-save addition (CSA) tree. By inserting registers,the algorithm shortens the critical path and guarantees operands to arrive at the CSAinput ports simultaneously, which significantly improves the speed of modularmultiplication. Modular-multiplication sequence was adjusted in modular exponentiation, which avoids most format conversion and reduces the conversion time.The proposed modular exponentiation circuit has a higher performance improvementcompared with most representative design.
     4. Elliptic Cure Cryptography contains a large number of modular multiplicationand squaring operations over prime and binary finite fields. For prime finite field, thetraditional Montgomery algorithm was modified and2-bit prefix Montgomery modularmultiplication circuit was designed. Then partial-products were reconstructed based oninherent characteristic of square arithmetic, which reduces the number ofpartial-products by half. Half Number Partial-Products modular squaring algorithmwas proposed and modular squaring circuit was designed based on the new algorithm.Modular squaring operation time is only half of modular multiplication. For binaryfinite field, word-serial modular multiplication algorithm was proposed. Multiplier canbe implemented by pipeline technique due to multiplicand shifted left by two words inthe new algorithm. Also, the new algorithm simplifies the calculation of some keyvariables, thus the circuit path delay of multiplier is reduced. The result of modularmultiplication can be fast calculated by word-serial modular multiplier.Two-words-serial modular squaring algorithm was proposed. The algorithm adoptsMontgomery method to do modular reduction on the squaring result which is directlyobtained according to the characteristic of binary finite fields square arithmetic. Thealgorithm can handle two words at one time, thus the calculation time of modularsquaring is half of that of modular multiplication. Then this thesis presents a highlyefficient ECC dual-field processor consisted of these finite field arithmetic units.
     5. The switchable TAM architecture was presented that some IP cores attached tomultiple TAMs by switching circuit. So these IP cores can be tested by several TAMs,which will reduce idle time and test time effectively. By0-1programming, which wasrestricted in some given conditions, each IP core was allocated to a TAM, and thenheuristic search arithmetic was used to pick out some appropriate IP cores to be testedby multiple TAMs. Experimental results on ITC2002benchmark circuits show that ourapproach is better than some other approaches.
