用户名: 密码: 验证码:
片上网络路由器的缓存优化研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着半导体工艺尺寸的持续降低,系统集成度的不断提高,SoC体系结构的焦点由单核以计算为核心的设计转移到多核以通信为核心的设计。在复杂的多核SoC系统中,通信所消耗的功耗已经和逻辑运算以及存储消耗的功耗相当,并有望超过它们。芯片面积和功耗预算日益受控于芯片的互连网络,这使得片上互连设计研究倍受关注,并成为推动NoC优化研究的一个主要动力。片上网络路由器的缓存占据着整个网络很大比重的功耗和面积,并对网络的性能有着显著的影响,因此片上网络路由器缓存的优化研究具有重要的意义。
     为了指导缓存优化,给出了一种基于参数组(λ, H ,α,δ)的流量特征模型,该模型能够准确地量化应用的负载特征。同时利用经典的流量模型分析了片上网络缓存利用率分布的一般规律,发现了缓存利用率的时间和空间分布不均,由此提出了动态优化与静态优化相结合的优化思想,在这种思想的指导下,完成了以下两种缓存优化方案的设计。
     1)针对一般性的虚拟通道路由器结构,提出了一种静态虚通道数量分配与动态功率门控相结合的缓存优化方法。首先建立片上网络输出阻塞概率模型,依据网络中阻塞概率分配网络中不同节点不同端口的虚拟通道数量,平衡网络中虚拟通道的利用情况。然后引入功率门控策略,适时关闭掉系统运行中处于空闲状态的虚拟通道,降低片上网络的静态功耗开销。这样,针对片上网络的缓存利用特征从空间和时间两个角度实现了缓存的优化。试验结果表明,与单一的缓存分配或者功率门控策略相比,这种混合的优化算法在低负载下能够获得20.30%~44.53%的能耗节省,即使是在网络饱和注入率附近,也可获得大约8.5%左右的能耗节省。
     2)针对虚拟通道功能限定的路由器结构,提出了一种静态缓存容量分配与动态缓存结构相结合的缓存优化方法。首先采用简化的单一缓存通道路由器模型进行排队分析,计算网络中各个端口缓存满概率,并以此为依据应用贪心算法分配各个端口缓存容量,去除网络中缓存的空间分布冗余。然后设计了一种动态缓存结构,使得同一端口不同的虚拟通道能够共享所有空闲的缓存,充分利用了有限的缓存资源。这样,静态缓存容量分配算法忽略的同一端口内数据冲突信息正好可由动态缓存结构来弥补,两种优化方法结合起来,进一步降低了片上网络缓存开销。与单一的缓存分配或者动态缓存结构相比,结合后的优化方法能够节省10.00%~46.43%的缓存开销。
As semiconductor technology characteristic size keeps on scaling down and on-chip system gets more complex continuously, the calculation centric monocore design diverts to the communication centric multicore design. In a complex multicore SoC, the power consumption of communication is close to that of calculation and storage, and even going beyond in near future. Thus the chip area and power consumption budgets are getting limited by the communication interconnection, which brings the thrust of research on inner chip interconnection design and optimization. As buffer in on-chip router consumes a large proportion of network power and area, and has a significant effect on the performance as well, it is indispensable to focus on network-on-chip (NoC) buffer optimization before NoC implementation.
     To guide the buffer optimization, an application oriented traffic model based on parameter (λ, H ,α,δ) is proposed, which can be used to quantify the traffic characteristic precisely. At the same time, a classic traffic model is applied to explorer the general distribution characteristic of on-chip buffer utilization. It turns out to be that the distribution of on-chip buffer utilization is greatly unbalanced from both the temporal view and spatial view, and a buffer optimization method combined static buffer optimization and dynamic buffer optimization is proposed. With the instruction of the method, two specific buffer optimization strategies oriented different router architectures are figured out, and will be described in detail as below.
     1) Aimed at the general virtual channel router architecture, a buffer optimization strategy combined static virtual channel number allocation and dynamic power gating is put forward. To balance the buffer utilization across the network, the virtual channel number for each input port of the router are pre-configured based on output port block probability derived through router queuing model. And then the power gating technology is introduced to shut down idle VCs at time. The two optimization strategies eliminate temporal buffer redundancy and spatial buffer redundancy respectively. The simulation result shows that the combined optimization outperforms either static buffer allocation or dynamic power gating separately with 20.30%~44.53% more power saving at light load and approximately 8.5% more power saving even when the network gets saturated.
     2) A buffer optimization strategy combined with static buffer capacity allocation algorithm and dynamic buffer architecture is presented for router with constrained virtual channels specially. First of all, the buffer full probability is calculated with a simplified router model which has a single buffer channel in each input port, and then buffer capacities are allocated according to buffer full probability using a greedy algorithm. After that, scalable dynamic buffer architecture is designed to adjust virtual channel depth at run-time. Therefore, any single idle buffer word could be used by the virtual channels in the same input port freely, which remedies the neglected contention in the same input port in static buffer capacity allocation algorithm. Consequently, the combined optimization strategy is prone to better performance. The simulation results confirm that 10.00%~46.43% more power saving can be acquired compared with either buffer capacity algorithm or dynamic buffer allocation.
引文
1. The International Technology Roadmap for Semiconductors. ITRS’s Report 2006. http://public.itrs.net/Files/2006Update/2006Update. htm
    2. Kumar S and Jantsch A. A Network on Chip Architecture and Design Methodology. IEEE Computer Society Annual Symposium on VLSI, Pittsburgh, Pennsylvania, 2002: 105~112
    3. Benini L and Micheli G D. Network-on-chip: a new paradigm for systems on chip design. Design Automation and Test in Europe Conference and Exhibition, Paris, 2002: 418~ 419
    4. ARM Corp. AMBATM Protocol Specification. http://www.arm.com/products solutions AMBA.spec.htm
    5. L. Benini and G. D. Micheli. Networks on Chips: A New SoC Paradigm. IEEE Trans. on Computers. 2002, 35(1):70~78
    6. T. Bjerregaard, S. Mahadevan. A Survey of Research and Practices of Network-on-Chip. 2006, 38(3):1~51
    7. John D. Owens, William J. Dally and D. N. Keckler. Research Challenges for On-Chip-Interconnection Networks. IEEE Workshops on On-Chip and Off-Chip Interconnections for Multicore Systems, Palo Alto, California, 2006:96~108
    8. T.Ahonen, D. A. Sigenza-Tortosa, H.Bin et al. Topology Optimization for Application Specific Networks-on-chip. International Workshop on System Level Interconnect Prediction (SLIP), Paris, 2004: 53~60
    9. M.Amde, T.Felicijan, A.Edwards, et al. Asynchronous on-chip networks. IEEE Proceedings of Computers and Digital Techniques, 2005, 152(2):273~283
    10. J. Kim, C. Nicopoulos, D. Park, et al. A Gracefully Degrading and Energy-Efficient Modular Router Architecture for On-Chip Networks. The 33rd International Symposium on Computer Architecture (ISCAS), Oslo, Norway, 2006: 4~15
    11. J. Kim, D. Park, T. Theocharides, et al. A Low Latency Router Supporting Adaptivity for On-Chip Interconnects. The 42nd Design Automation Conference, Anaheim, California, USA, 2005: 559~564
    12. Rakesh Kumar, Victor Zyuban, Dean M. Tullsen. Interconnections in Multi-core Architectures: Understanding Mechanisms, Overhead and Scaling. The 32nd International Symposium on Computer Architecture (ISCAS), Madison, USA, 2005:408~419
    13. S. Heo and K. Asanovic. Replacing Global Wires with an On-Chip Network: a Power Analysis, International Symposium on Low Power Electronics andDesign (ISLPED), San Diego, CA, USA, 2005: 369~374
    14. S. Li, L. S. Peh and N. K. Jha. Dynamic Voltage Scaling with Links for Power Optimization of Interconnection Networks. The 9th International Symposium on High-Performance Computer Architecture (HPCA), Anaheim, CA, USA, 2003: 91~102
    15. W. Hangsheng, L. S. Peh and S. Malik. Power-Driven Design of Router Microarchitectures in On-Chip Networks. The 36th International Symposium on Microarchitecture, San Diego, CA, USA, 2003: 105~116
    16. T. T. Ye, L. Benini and G. De Micheli. Analysis of Power Consumption on Switch Fabrics in Network Routers. The 39th Design Automation Conference (DAC), New Orleans, LA, USA, 2002: 524~529
    17. Xuning Chen and Li-Shiuan Peh. Leakage Power Modeling and Optimization in Interconnection Networks. International Symposium on Low Power Electronics and Design (ISLPED), Korea, August 2003:90~95
    18. C. A. Nicopoulos, D. Park, J. Kim, et al. VichaR: A Dynamic Virtual Channel Regulator for Network-on-Chip Routers. The 39th Annual IEEE/ACM International Symposium on Microarchitecture, Orlando, USA, 2006: 333~344
    19. J. Hu and R. Marculescu. Application-Specific Buffer Space Allocation for Networks-on-Chip Router Design, The IEEE/ACM International Conference on Computer Aided Design (ICCAD), San Diego, CA, USA, 2004: 354~361
    20. Li-Shiuan Peh and William J. Dally. A Delay Model for Router Microarchitectures. International Symposium on Microarchitecture, 2001, 21(1):26~34
    21. A. Susin. Socin: A Parametric and Scalable Network-on-Chip. in the 16th Symposium on Integrated Circuits and Systems Design, Sao Paulo, Brazil, 2003:169-174
    22. J. Ding and L. N. Bhuyan. Evaluation of Multi-queue Buffered Multistage Interconnection Networks under Uniform and Non-Uniform Traffic Patterns. International Journal of Systems Science. 1997, 28 (11): 1115~1128
    23. Nan Ni, Marius Pirvu and Laxmi Bhuyan. Circular Buffered Switch Design with Wormhole Routing and Virtual Channels. VLSI in Computers and Processors, 1998: 466~473
    24. Hiroki Matsutani, Michihiro Koibuchi and Daihn Wang. Run-time Power Gating of On-Chip Routers Using Look-Ahead Routing. Design Automation Conference (ASPDAC). Seoual, Korea, 2008: 55~60
    25. W. J. Dally. Virtual-channel Flow Control. IEEE Trans. on Computers. 1990: 60~68
    26. Ting-Chun Huang, Umit Y. Ogras and Radu Marculescu. Virtual ChannelsPlanning for Networks-on-Chip. Proceedings of 8th International on Quality Electronic Design (ISQED), San Jose, CA, 2007: 879~884
    27. Taqqu M S, Willinger W and Sherman R. Proof of a Fundam- ental Result in Self-similar Traffic Modeling. Computer Communication Review, 1997, 27: 5~23.
    28.胡严,张光昭.重尾ON/OFF源模型生成自相似业务流研究.电路与系统学报. 2001, 6(3):72~76
    29. H. Sarbazi-Azad, M. Ould-Khaoua and L. M. Mackenzie. On the Performance of Adaptive Wormhole Routing in the Bi-Directional Torus Network: A Hot Spot Analysis. Microprocessors and Microsystems, 2001, 25:277~285
    30. Partha Pratim Pande, Cristian Grecu, Michael Jones, et al. Performance Evaluation and Design Trade-Offs for Network-on-Chip Interconnect Architectures. IEEE Trans. on Computers, 2005, 54(8):1025~1039
    31.刘华阳.网络系统仿真中的流量模型研究.军民两用技术与产品, 2006, 6(2): 42~43.
    32. Mostafa Rezazad and Hamid Sarbazi-azad. The Effect of Virtual Channel Organization on the Performance of Interconnection Networks. The 19th IEEE International Parallel and Distributed Processing Symposium (IPDPS), Denuver, Colorado, 2005:526~533
    33. G.. Varatkarand and R. Marculescu. Traffic Analysis for On-Chip Networks Design of Multimedia Applications. Design Automation Conference (DAC), New Orleans, LA, USA, 2002: 510~517
    34.田海燕,王换招,李玉鹏.泊松采样在基于RMON的网络测量中的研究及实现.微电子学与计算机, 2003, 20(7): 56~59
    35.邵立松,窦文华.自相似网络通信量模型研究综述.电子与信息学报, 2005, 27(10):1671~1676
    36. Vassos Soteriou, Hangsheng Wang and Li-Shiuan Peh. A Statistical Traffic Model for On-Chip Interconnection Networks. International Conference on Measurement and Simulation of Computer and Telecommunication Systems (MASCOTS’06), Washington, DC, USA, 2006:104~116
    37. Jun Ho Bahn and Nader Bagherzadeh. A Generic Traffic Model for On-Chip Interconnection Networks. International Symposium on Microarchitecture, Orlando, Florida, USA, 2006.26:52~60
    38. N. L. Binkert, R. G. Dreslinski, L. R. Hsu, K. T. Lim, A. G. Saidi, and S. K. Reinhardt, The M5 Simulator: Modeling Networked Systems, Micro, IEEE, 2006:26(4):52~60.
    39. R. Balasubramonian, S. Dwarkadas, and D. H. Albonesi. Dynamically Managing the Communication-Parallelism Trade-Off in Future ClusteredProcessors. San Diego, CA, United States, 2003: 275~286.
    40. P. Joan-Manuel and S. Julio. On-Chip Interconnects and Instruction Steering Schemes for Clustered Micro-architectures. IEEE Trans. Parallel Distribute System. 2005,16: 130~144
    41. E. Perelman, G. Hamerly, and B. Calder. Picking statistically valid and early simulation points. The 12th International Conference on Parallel Architectures and Compilation Techniques (PACT). Washington, DC, USA, 2003: 244~255.
    42. Z. Hu, A. Buyuktosunoglu, V. Srinivasan et al. Microarchitectural Techniques for Power Gating of Execution Units. International Symposium on Low Power Electronics and Design (ISLPED). Newport Beach, California, USA, 2004: 32~37
    43. Hang-Sheng Wang, Li-Shiuan Peh and Sharad Malik. Orion: A Power-Performance Simulator for Interconnection Networks. The 35th Annual International Symposium on Microarchitecture, Istanbul, Turkey, 2002:204~305
    44. E. Rijpkema, K. Goossens, P. Wielage. A Router Architecture for Networks on Silicon. The 2nd Workshop on Embedded Systems, Stockholm, Sweden, 2001:181-188
    45. H. S. Wang, L. S. Peh, S. Malik. Power-Driver Design of Router Microarchitecture in On-Chip Networks. The 36 International Symposium on Microarchitecture (Micro-36), San Diego, CA, USA, 2003:105-116

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700