详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
The performance of float-point multiplier is the bottleneck for the high performance microprocessor, because the architecture of float-point multiplier is very complex, and its latency of the circuit-implement is especially long. Optimize the speed of the implementation of float-point multiplier is very importment for the improvement of microprocessor. Semi-custom design can’t satisfy the more and more high frequency. In order to get to the target, partial product compression and accumulation is designed by full-custom. Optimizing the float-point multiplier by the method of combination of full-custom and semi-custom is effective.
     The fruit of studying is that:
     1. A novel 4-2 compressor is proposed in this paper is used in the compression, the latency is less 27.5 percentage than original 4-2 compressor;
     2. The latency of the 4-2 compressor designed by full-custom is 0.11ns, and the latency of the 4-2 compressor designed by semi-custom is 0.18ns. The latency of the 4-2 compressor designed by full-custom is less 39 percentage than the 4-2 compressor designed by semi-custom;
     3. Analyze the related the number of bit of team adder with latency of carry tree to give the method of implementing high speed 136-bit adder by all parallel no matter the Sum or Carry. And designed by full-custom, the latency is 0.30ns, making the latency of the partial product accumulation be less 21.3 percentage than that semi-custom.
     The synthesized frequency of optimized float-point multiplier is 1.8GHz based on 65nm technology, which increase 30 percentage than 1.4GHz designed by semi-custom. Physical design the float-point multiplier, and after placing and routing the frequency is 1.36GHz.
[1]陈春章,艾霞,王国雄.数字集成电路物理设计.科学出版社, 2008.
    [2]国际半导体技术发展路线图.中国集成电路, 2009.6.
    [3]关于高性能微处理器的综述.中国龙芯论坛, 2007.4.
    [4] S. F. Oberman. Design issues in high performance floating point arithmetic units.Stanford University, Technical Report, 1996:9-10.
    [5] The Institute of Electrical and Electronics Engineers. IEEE Standard for BinaryFloating-Point Arithmetic, 1985.
    [6] BOOTH. A SIGNED BINARY MULTIPLICATION TECHNIQUE. QuaterlyJournal of Mechanics and Applied Mathematics, 1951.
    [7] O. L. McSorley. High-Speed Arithmetic in Binary Computer. Proc IRE,1961:67-91.
    [8] Kiseon Cho. 54x54-bit radix-4 multiplier based on modified booth algorithm.Proceedings of the 13th ACM Great Lakes symposium on VLSI, 2003.
    [9] P. M. Seidel, L. D. McFearin, D. W. Matula. Binary Multiplication Radix-32and Radix-256. Proceedings of the 15th IEEE Symposium on ComputerArithmetic, 2001.
    [10] M. Mehta, V. Parmar, E. Swartzlander. High-Speed Multiplier Design UsingMulti-Input Counter and Compressor Circuits. 10th IEEE Symposium onComputer Arithmetic, 1991.
    [11] C. S. Wallace. A Suggestion for a Fast Multiplier. IEEE Transaxtions onElectronic Computers, 1964.
    [12] L. Dadda. Some Schemes for Parallel Multipliers. Alta Frequenza, 1965.
    [13] Ivan D. Castellanos, James E. Stine. Compressor trees for decimal partialproduct reduction. Proceedings of the 18th ACM Great Lakes symposium onVLSI, 2008.
    [14] R. Zimmermmann. Low-Power Logic Styles: CMOS versus Pass-transistorLogic. IEEE of Solid-state Circuits, 1997:1079-1090.
    [15] Yajuan He, Chip-Hong Chang, Iangmin Gu. An area efficient 64-bit square rootcarry-select adder for low power applications. ISCAS, 2005.
    [16] O. Bedrij. Carry Select Adder. IRE Trans, on Electronic Computer,1962:340-346.
    [17] J. Sklansky. Conditional Sum addition logic. IRE Trans. Electron Computer,1960:226-231.
    [18] R. P. Brent, H. T. Kung. A Regular Layout for Parallel Adders. IEEETransactions on Computers, 1982.
    [19] P. M. Kogge, H. S. Stone. A Parallel Algorithm for the Efficient Solution of aGeneral Class of Recurrence Equations. IEEE Transactions on Computers,August 1973, vol.C-22:786-793.
    [20] M. Santoro, G. Bewick, M. Horowitz. Rounding Algorithms for IEEEMultipliers Symposium on Computer Arithmetic, 1989.
    [21] R. Yu, G. Zyner. 76MHz Radix-4 Floating Point Multiplier. IEEE Symposiumon Computer Arithmetic, 1995.
    [22] P. K. Montoye, E. Hokenek. Design of the IBMRIsc Systom/6000Floating-Point Exeution Unit. IBM Journal Research and Development, 1990.
    [23] M. R. Santor. Design and Clocking of Mutipliers. TR 89-397. StanfordUniversity, 1989.
    [24]郝志刚,曾献君.一种并行的sticky位计算方法.计算机工程与科学, 2006,28(4):124-129.
    [25] N. Ohkubo, M. Suzuki, T. Shinbo. A 4.4ns CMOS 54x54-b Multplier UsingPass-Transistor Multiplexer. IEEE Journal of Solid-State Circuit, 1995, Vol.30(3):251-257.
    [26] Y. Hagihara, S. Inui. A 2.7ns 0.25gm CMOS 54x54-b Multiplier. IEEEInternational Conference on Solid-State Circuit, 1998:296-297.
    [27] N. Itoh, Y. Naemura. A 600-MHz 54x54-bit multiplier with rectangular-styledWallace tree. IEEE Jounal of Solid-State Circuit, 2001, Vol.36:249-257.
    [28] R. Montoye, W. Belluomini. A double precision floating point multiply. IEEEInternational Solid-State Circuits Conference, 2003, Vol.1:336-337.
    [29] A. Vazquez, E. Antelo, P. Montuschi. A new family of high-performanceparallel decimal multipliers. In Proc. 18th IEEE Symp. Comput. Arithmetic,2007:195-204.
    [30] I. D. Castellanos, J. E. Stine. Decimal partial product generation architectures.In Proc. 51st Midwest Symp. Circuits Syst, Aug. 2008:962-965.
    [31] G. Jaberipur, A. Kaivani. Improving the speed of parallel decimal multiplication.IEEE Trans. Comput, Nov. 2009, vol. 58:1539-1552.
    [32] B. J. Hickmann, A. Krioukov, M. J. Schulte. A parallel IEEE P754 decimalfloating-point multiplier. In Proc. IEEE Int. Conf. Comput., Oct. 2007:296-303.
    [33] R. Raafat, R. Samy, T. ElDeeb. A decimal fully parallel and pipelined floatingpoint multiplier. In Proc. 42nd Asilomar Conf. Signals, Syst. Comput., Oct.2008:1800-1804.
    [34] Charles Tsen, Michael Schulte. A Combined Decimal and Binary Floating-pointMultiplier. 20th IEEE Intematinal Conference on Application-specific System,Architectures and Processors, 2009:8-15.
    [35]于敦山,沈绪榜.32位定/浮点乘法器设计.半导体学报. 2001, Vol.(22):91-95.
    [37]唐志敏.一种快速的浮点乘法器结构.计算机研究与发展, 2003.6.
    [38]胡伟武,张齐.龙芯2号处理器功能部件设计.计算机研究与发展, 2006.43(6):967-973.
    [39]黎渊.高性能浮点乘加部件的研究与实现:硕士学位论文.长沙:国防科学技术大学, 2008.11.
    [40]赵忠民.64位高性能嵌入式CPU中乘法器单元的设计与实现:硕士学位论文.上海:同济大学, 2007.3.
    [41]张予器.超高精度浮点运算的关键技术研究:硕士学位论文.长沙:国防科学技术大学, 2005.11.
    [42] Raghuveer. A Parametric Approach to Bispec-trum Estimation AcousticsSpeedchand Signal Processing. IEEE International Conference on ICASsp apos,1984.
    [43] Sreehari Veeramachaneni, Lingamneni Avinash. Novel Architectures forHigh-Speed and Low-Power 3-2, 4-2 and 5-2 Compressors. IEEE 20thInternational Conference on VLSI Design, 2007.
    [44] Jan M. Rabaey, Borivoje Nikolic. Digital Integrated Circuits: A DesignPerspctive, 2004.10:431-438.
    [45]宋焕章,张春元,王保恒.计算机原理与设计——中央处理器.国防科技大学出版社, 2000:109-115.
    [46] A. Weinberger, J. L. Smith. A One-Microsecond Adder Using One-MegacycleCircuitry. IRE Transactions on Electronic Computers, 1956, vol.5:65-73.
    [47]孙岩.高性能算术逻辑部件研究与全定制设计:硕士学位论文.长沙:国防科学技术大学, 2005.11.
    [48] M. S. Schmooker, K. J. Nowka. Leading Zero Anticipation and Detection:AComparison of Methods, Proc IEEE 15th Symp on Computer Arithmetic,2001:7-12.
    [49]胡春媚,江东,马剑武.基于标准单元ASIC设计的综合优化综述.计算机工程与科学, 2005, Vol.27.No.4.
    [50] The VIS Instruction Set Version 1.0. http://www.sun.com, June 2002.
    [51] Viji Srinicasan, David Brooks, Mickael Gschwind, Pradip Bose. OptimizingPipeline for Power and Performance. 35th Annual IEEE/ACM InternationalSymposium on Microarchtecture, 2002:333-344.
    [52]张静波.高性能浮点乘加部件的优化设计:硕士学位论文.长沙:国防科学技术大学, 2007.11.
    [53] A. M. Shams, M. A. Bayoumi. A structured approach for designing low-poweradders. In Proc 31st Asilomar. Conf. Signals, Syst. Computers, 1997,vol.l:757-761.
    [54] Karuna Prasad. Low-Power 4-2 and 5-2 Compressors. IEEE, 2001:129-133.
    [55] SUZUKI, SHINBO, YAMANAKA. A 1.5-ns 32-b CMOS ALU in doublepass-transistor logic. IEEE. Solid State Circuits, 1993, v(28):1229-1236.
    [56] Chip-Hong Chang, Mingyan Zhang. Ultra Low-Voltage Low-Power CMOS 4-2and 5-2 Compressor for Fast Arithmetic Circuits. IEEE TRANSACTIONS ONCIRCUITS AND SYSTEM, 2004.10.
    [57] A. P. Chandrakasan, R. W. Brodersen. Low Power Digital CMOS Design.Norwell. MA: Kluwer, 1995.
    [58] J. M. Rabaey, A. Chandrakasan, B. Nikolic. Digital Integrated Circuits. PrenticeHall, 2003.
    [59] Johannes Grad, James E. Stine. A Standard Cell Library for Student Projects.http://www.ieee.com.
    [60]谭全林.指令缓存数据阵列的设计与实现:硕士学位论文.长沙:国防科学技术大学, 2009.3.
    [61]冯超超.半定制与全定制混合设计流程中验证方法研究:硕士学位论文.长沙:国防科学技术大学, 2007.11.
    [62]高正坤. X处理器的浮点部件设计与实现:硕士学位论文.长沙:国防科学技术大学, 2007.11.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700