同构多处理器片上网络互连的设计
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
集成电路工艺的发展,使得单块芯片上的处理资源越来越多,由于功耗与性能方面的考虑,处理资源的分配由原来的少数几个强大的核向众多小核演变,再加上全局连线在亚深微米下的延时与功耗问题,总线这种互连变得不再合适。
     片上网络将宏观网络中数据包交换与路由机制引入到单块芯片上,用于众多IP核的互连,它具有可扩展性好、结构规整、模块性好等优点,再随着3D IC工艺的发展,成为一种广泛研究的技术。
     本课题设计了同构多处理器与片上网络互连的接口,建立了一个以片上网络互连的多处理器系统。主要研究了在片上网络互连的多处理器系统中,如何解决内存成为系统性能瓶颈的问题。课题将分布式共享内存的概念运用到片上网络中,并通过一些软件的方法进行针对这一结构的优化配置,最后在多线程程序的测试中,观察到了性能提升和网络吞吐率的增加。
As the development of IC technology, the processing resources on a single chip become more and more. Because of the relationship between power and performance, those processing resources are more and more likely to be divided into many small cores, rather than several strong cores. Also, in Deep Sub-Micro (DSM) process, global wires face delay and driving loads problems, thus the conventional bus interconnection becomes unsuitable for future System on Chip interconnection.
     Network on Chip adopts data packet switching and routing techniques on a single chip, and was designed for many IP cores interconnection. NoC has advantages of extensibility, regularity and modularity. Along with the 3D IC’s development, it becomes a widely researched topic.
     After designing a network adapter for a NoC, This thesis first established a network-on-chip connected multi-processor system, and then researched on how to avoid that the memory becomes performance and throughput bottleneck.We applied the Distributed Shared Memory architecture on NoC connected Multi-processor, and did some optimization in software way. This design had showed improvemence of performance and throughput in a final muti-threaded benchmark.
引文
[1] T. Bjerregaard and S. Mahadevan. A survey of research and practices of network-on-chip. ACM Comput. Surveys (CSUR), vol. 38, 2006.
    [2] Shekhar Borkar. Thousand core chips: a technology perspective. Proceedings of the 44th annual conference on Design automation. June 04-08, 2007, San Diego, California.
    [3] L. Zhang, Y. Han, Q. Xu and X. Li. Defect Tolerance in Homogeneous Manycore Processors Using Core-Level Redundancy with Unified Topology. Proc. IEEE/ACM Design, Automation, and Test in Europe (DATE), pp. 891-896, Mar. 2008.
    [4] W. J. Dally, B. Towles. Route packets, not wires: on-chip interconnection networks. Proc. DAC, pp. 18–22, 2001.
    [5] White Paper:―From a Few Cores to Many: A Tera-scale Computing Research Overview‖http://download.intel.com/research/platform/terascale/terascale_overview_paper.pdf
    [6] Dally, W. J. 1990. Performance analysis of k-ary n-cube interconnection networks. IEEE Trans. Comput. 39, 6 (June) 775–785.
    [7] Karim, F. Nguyen, A. Dey, S. Rao. On-chip communication architecture for OC-768 network processors. In Proceedings of the 38th Design Automation Conference (DAC). ACM, 678–683, 2001.
    [8] Guerrier, P. and Greiner. A generic architecture for on-chip packet-switched interconnections. In Proceedings of the Design Automation and Test in Europe (DATE). IEEE, 250–256. 2000.
    [9] Pande, P. P., Grecu, C., Ivanov, A., And Saleh, R. Design of a switch for network-on-chip applications. IEEE International Symposium on Circuits and Systems (ISCAS) 5, 217–220. 2003.
    [10] Wenbiao Zhou, Yan Zhang, Zhigang Mao. Pareto based Multi-objective MappingIP Cores onto NoC Architectures. Circuits and Systems, 2006. 331-334.
    [11] T.A. Bartic, J.Y. Mignolet, T. Marescaux, D. Verkest, S. Vernalde, and R. Lauwereins. Topology adaptive network-on-chip design and implementation. In Computer and Digital Tecniques, IEEE Proceedings, pages 467--472. IEE Proceedings, July 2005.
    [12] Jingcao Hu, Radu Marculescu. "DyAD– Smart Routing for Networks-on-Chip". Design Automation Conference, June, 2004.
    [13] M. Li, Q.-A. Zeng and W.-B. Jone. DyXY—A Proximity Congestion-Aware Deadlock-Free Dynamic Routing Method for Networks on Chip. Proc. ACM/IEEE Design Automation Conf., pp. 849-852, July 2006.
    [14] W. R. Davis, J. Wilson, S. Mick, J. Xu, H. Hua, C. Mineo, A. M. Sule, M. Steer, and P. D. Franzon. Demystifying 3D ICs: the Pros and Cons of Going Vertical. IEEE Design and Test of Computers, 22(6):498– 510, 2005.
    [15] Y. Xie, G. H. Loh, B. Black, and K. Bernstein. Design space exploration for 3D architectures. J. Emerg. Technol. Comput. Syst., 2(2):65–103, 2006.
    [16] R. Reif, A. Fan, C. Kuan-Neng, and S. Das. Fabrication technologies for three-dimensional integrated circuits. In Quality Electronic Design, 2002. Proceedings. International Symposium on, pages 33–37, 2002.
    [17] Amde, M., Felicijan, T., Edwards, A. E. D., And Lavagno, L. 2005. Asynchronous on-chip networks. IEEE Proceedings of Computers and Digital Techniques 152, 273–283.
    [18] Bjerregaard, T., Mahadevan, S., Olsen, R. G., And Spars, J. 2005. An OCP compliant network adapter for GALS-based soc design using the MANGO network-on-chip. In Proceedings of International Symposium on System-on-Chip (ISSoC). IEEE.
    [19] Beigne, E., Clermidy, F., Vivet, P., Clouard, A., And Renaudin,M. 2005. An asynchronous NOC architecture providing low latency service and its multi-level design framework. In Proceedings of the 11th International Symposium on Asynchronous Circuits and Systems (ASYNC). IEEE, 54–63.
    [20] A. Banerjee, R. Mullins, and S. Moore. A power and energy exploration of network-on-chip architectures. 2007.
    [21] Umit Y. Ogras , Radu Marculescu , Hyung Gyu Lee , Puru Choudhary , DianaMarculescu , Michael Kaufman , Peter Nelson, Challenges and Promising Results in NoC Prototyping Using FPGAs, IEEE Micro, v.27 n.5, p.86-95, September 2007.
    [22] J.L. Hennessy and D.A. Patterson, Computer Architecture: A Quantitative Approach (third ed.), Morgan Kaufmann Publishers Inc., San Francisco (2003).
    [23] M.Stumm and S.Zhou. Algorithms Implementing Distributed Shared Memory. Computer, Vol.23, No.5, May 1990, pp.54-64
    [24] D.L.Black, A.Gupta, and W.Weber. Competitive Management of Distributed Shared Memory. Compcon 89, CS Press, 1989, pp.184-190
    [25] Jelica Protic, Milo Tomasevic, Veljko Milutinovic. Distributed Shared Memory: Concepts and Systems. IEEE Concurrency, vol. 4, no. 2, pp. 63-79, June 1996.
    [26] J. Xu, W. Wolf , J. Henkel and S. Chakradhar. A design methodology for application-specific networks-on-chips. ACM Trans. Embed. Comput. Syst., vol. 5, pp. 263, May 2006.
    [27] Monchiero M. et al. Exploration of Distributed Shared Memory Architectures for NoC based Multi-Processors. IEEE ICSAMOS.2006.300821, pp 144-151, 2006.
    [28] LIP6 Paris.―Soclib– An open platform for virtual prototyping of multi-processors system on chip (MP-SoC).‖https://www.soclib.fr/trac/dev/wiki
    [29] Open SystemC Initiative.―OSCI Standards‖. http://www.systemc.org
    [30] University of Southampton UK.―Nirgam– A Simulator for NoC interconnect Routing and Application Modeling‖. http://nirgam.ecs.soton.ac.uk/
    [31] SoC department of LIP6 Lab.―Mutekh project Home‖https://www-asim.lip6.fr/trac/mutekh
    [32] Shih-Hsun Hsu and Jer-Min Jou. Design and Implementation of a Router for Network-on-Chip. Taiwan: National Cheng Kung University. 2005.
    [33] Wang Zhang, Ligang Hou, Jinhui Wang, Shuqin Geng, Wuchen Wu. Comparison Research between XY and Odd-Even Routing Algorithm of a 2-Dimension 3X3 Mesh Topology Network-on-Chip. Intelligent Systems, 2009. GCIS '09. WRI Global Congress on Volume 3, 19-21 May 2009 Page(s):329 - 333
    [34] Evgeny Bolotin, Zvika Guz, Israel Cidon, Ran Ginosar, Avinoam Kolodny. The Power of Priority: NoC Based Distributed Cache Coherency. nocs, pp.117-126, First International Symposium on Networks-on-Chip (NOCS'07), 2007
    [35] OCPIP. 2003b. Open Core Protocol (OCP) Specification, Release 2.0. http://www.ocpip.org.
    [36] VSI ALLIANCE. 2000. Virtual component interface standard Version 2. VSI Alliance www.vsi.org.
    [37] ARM. 2004. AMBA Advanced eXtensible Interface (AXI) Protocol Specification, Version 1.0. http://www.arm.com.
    [38] PHILIPS SEMICONDUCTORS. 2002. Device Transaction Level (DTL) Protocol Specification, Version 2.2.
    [39] GNU Free Document.―GNU LD tool‖http://sourceware.org/binutils/docs/ld/index.html
    [40] L. M. Censier and P. Feautrier. A new solution to coherence problems in multicache systems. IEEE Trans. Comput., c-20(12):1112–1118, Nov. 1978.
    [41] D. Lenoski, J. Laudon, K. Gharachorloo, W.-D. Weber, A. Gupta, J. Hennessy, M. Horowitz, and M. Lam. The stanford dash multiprocessor. IEEE Trans. Comput., 25(3):63–79, Mar. 1992.
    [42] IEEE Standards Association.―IEEE POSIX Certification Authority‖http://standards.ieee.org/regauth/posix/
    [43] Robert Nowak.―Extends the Discrete Fourier Transform (DFT) into two-dimensions.‖http://cnx.org/content/m10987/latest/.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700