粗粒度数据流网络处理器体系结构研究

英文题名：Research on the Architecture of Coarse-grain Dataflow Network Processor
作者：李韬
论文级别：硕士
学科专业名称：计算机科学与技术
中文关键词：网络处理器 ; 粗粒度 ; 数据流 ; 体系结构
英文关键词：Network Processor ; Coarse-grain ; Dataflow ; Architecture
学位年度：2006
导师：卢锡城
学科代码：081203
学位授予单位：国防科学技术大学
论文提交日期：2006-05-01

摘要

为了满足日益增长网络带宽的处理要求,网络处理器体系结构设计必须更多地考虑匹配网络报文流的处理特性,基于数据流的网络处理器体系结构可以很好地利用网络应用的流处理特征。
     本文针对控制流网络处理器在指令级并行性开发和相对固定拓扑结构两方面的不足,将粗粒度数据流模型的设计思想引入到网络处理器体系结构的设计中,提出了一种新型粗粒度数据流网络处理器体系结构CDNP,并深入研究了CDNP中若干关键技术。本文的主要工作和创新点包括:
     (1)针对控制流网络处理器存在的问题和细粒度同步数据流网络处理器在可编程性方面的缺陷,提出了一种新型的粗粒度数据流网络处理器体系结构CDNP(Coarse-grainDataflow Network Processor)。CDNP在数据流模型的基础上,不仅通过将控制流结构引入到处理单元PE的设计中,提高了整个网络处理器的可编程性;还利用了数据流模型在指令级并行性开发上的优势,有效地开发了工作负载中的任务级并行性,从而获得较高的处理性能和灵活性。
     (2)针对CDNP报文的数据流处理特点,研究了CDNP中处理单元PE(ProcessingElement)的关键技术。首先,提出了PE中微核指令集的选取和基本逻辑功能的实现方案。其次,针对CDNP令牌处理的数据流驱动特性,确定了PE的令牌处理机制,要求PE的令牌处理模块在满足令牌接收、缓冲、转换、封装、发送等基本功能的基础上,充分考虑令牌处理的数据流驱动特性。最后,针对帧缓冲FB的管理工作,提出以硬件链表方式对帧缓冲FB进行设计,从而支持不同类型工作负载的报文保序。
     (3)基于CDNP令牌处理路径的软配置机制,论文提出了一种动态令牌处理路径调度算法DTPPS(Dynamic Token Processing Path Scheduling)。DTPPS算法监视CDNP各PE的负载状况,当负载不平衡时,调整工作负载的令牌处理路径,对重负载PE上的任务进行重新映射。模拟结果表明,该算法可较好地平衡各PE的工作负载,有效提高CDNP的系统流量。
     论文还介绍了基于SoPC(System on Programmable Chip)技术的CDNP原型系统设计。该原型系统使用片上高速通信网络连接4个PE和多个功能模块,可以对CDNP的功能以及令牌处理路径调度等关键技术和算法进行验证。本文的工作对网络处理器的设计具有重要的指导意义。
To meet the tremendous increasing of the network bandwidth, Network Processors (NPs) design should fully consider the matching with features of packet processing. The dataflow-based NP architecture can take advantage of flow-based characteristic of network applications.
     Aiming at the limitation of ILP exploitation and the fixed topology of control-flow NP, this dissertation proposes a new scheme of Coarse-grain Dataflow NP architecture (CDNP), by introducing the idea of coarse-grain dataflow design method. Several key techniques of CDNP are also investigated in-depth in this dissertation. As follows, the main work and contributions of the dissertation are:
     (1) Aiming at the problems of control-flow NP and the shortage in programmable ability of fine-grain synchronous dataflow NP architecture, a new scheme of CDNP architecture is proposed. Based on the data-flow model, CDNP not only improves the programmable ability of the entire NP by introducing the idea of control-flow structure into the design of Processing Element (PE), but also effectively exploits the task-level parallelism by making full use of the advantage of ILP exploitation in data-flow model. So it can get relatively high performance and flexibility in packet processing.
     (2) Aiming at the data-flow feature in packet processing of CDNP, the key techniques of PE design are researched. Firstly, implementation scheme of uCore ISA chosen and basic logic function are proposed. Secondly, mechanism of token processing in PE is researched, which demands the token processing module should implement the basic functions of token receiving, buffering, transition, encapsulation and sending, and match the characteristic of data-flow driven in token processing. Finally, aiming at management in frame buffer, the idea of hardware linklist is brought forward so as to provide good support in packet ordering of the same workload in the design of frame buffer.
     (3) Based on the mechanism of soft configuration for token processing path, a dynamic token processing path scheduling algorithm (DTPPS) is proposed. The algorithm monitors workload on each PE in CDNP. When the workloads among PEs become unbalanced, the algorithm prefers to adapt the token processing path of the workload and remap the task of heavy-loaded PE. The simulation shows that this algorithm can well balance the load of each PE and improve the overall throughput of CDNP effectively.
     Furthermore, the design of CDNP prototype system based on SoPC (System on Programmable Chip) is introduced. Four PEs and several functional modules are connected by communication network on the chip. The basic function of CDNP and some key techniques such as DTPPS can be analyzed and evaluated in depth in the prototype. The work in this dissertation can serve as an important guideline for the design of NPs.

引文

[1] New Network Processors for Next-Generation Networks, 2001. Intel Developer UPDATE Magazine.
    [2] Shah N. Understanding network processors [MS. Thesis]. Berkeley: Department of Electrical Engineering and Computer Sciences, University of California, 2001.
    [3] Building Next Generation Network Processors. White Paper, Agere, Inc., Sept. 1999.
    [4] Douglas E.Comer, Network Systems Design Using Network Processor. (Edition 1st), Published by Prentice Hall, 2003, pp.337-341
    [5] Cao Z, Wang Z, Zegura E. Performance of hashing-based schemes for Internet load balancing. In: Nokia FB, ed. Proc. of the IEEE INFOCOM 2000. Piscataway: IEEE Computer and Communications Societies, 2000. pp.332-341.
    [6] J. Wang and Klara Nahrstedt. Parallel IP packet forwarding for tomorrow's IP routers. In IEEE Workshop on High Performance Switching and Routing, pages 353-357, Dallas, TX, May 2001.
    [7] L. Kencl, J. Le Boudec. Adaptive load sharing for network processors. In IEEE INFOCOM 2002, New York, NY, USA, June 2002, pp. 545-554.
    [8] P. Pappu, T. Wolf. Scheduling Processing Resources in Programmable Routers. In IEEE INFOCOM 2002, New York, NY, USA, June 2002, pp. 104 -112.
    [9] Mark A. Franklin and Seema Datar. Pipeline Task Scheduling on Network Processors. Workshop on Network Processors and Applications-NP3, Madrid, Spain, February 2004
    [10] Mohammad Shorfuzzaman, Rasit Eskicioglu and Peter Graham. Architectures for Network Processors: Key Features, Evaluation, and Trends. CIC 2004, Las Vegas, Nevada, USA, June 2004, pp.141-146.
    [11] Dominic Herity, Network Processor Programming, www.embedded.com/story/OEG20010730S0053
    [12] Jakob Carlstrom , Thomas Boden. Synchronous Dataflow Architecture for Network Processors. In IEEE MICRO, Sept.2004, pp.10-18.
    [13] Jurij Silc, Borut Robic, Theo Ungerer. Processor Architecture: From Dataflow to Superscalar and Beyond. Springer-verlag Berlin Heidelberg 1999, pp.55-56.
    [14] IBM Corporation. IBM PowerNP NP4GS3 Network Processor Datasheet, http://www.ibm.com ,May 2001
    [15] E. Seamans and M. Rosenblum, Parallel Decompositions of a Packet-Processing Workload, Proc. of Advanced Networking and Communications Hardware Workshop (ANCHOR) held in conjunction with the 31st Annual International Symposium on Computer Architecture (ISCA 2004), Munich, Germany,pp. 40-48, 2004.
    [16] J. Silc, B. Robic and T. Ungerer. Asynchrony in parallel computing: from dataflow to multithreading. Parallel and Distributed Computing Practices., Vol.1, No.1, March 1998.
    [17]L. Roh and W.A. Najjar. Design of a storage hierarchy in multithreaded architectures. in Proc. 28th MICRO, 1995, pp. 271-278.
    [18]AMD Corporation. AMD Alchemy Solutions, Au1000 Processor Family. Product brief, AMD, Inc., 2003.
    [19] T. Kunz. The influence of different workload descriptions on a heuristic load balancing scheme. IEEE Transactions on Software Engineering, Vol. 17, No. 7.
    [20] L. Kencl, J. Le Boudec. Adaptive load sharing for network processors. In IEEE INFOCOM 2002, New York, NY, USA, June 2002, pp. 545-554.
    [21] 刘达, 胡敏, 可编程系统芯片(SoPC) 发展策略,集成电路应用. 2003(1).
    [22]Faraydon Karim, Anh Nguyen, Sujit Dey, Ramesh Rao. On-Chip Communication Architecture for OC-768 Network processors. Annual ACM IEEE Design Automation Conference, Proceedings of the 38th conference on Design automation, Las Vegas, Nevada, United States, 2001.
    [23] Jiri Gaisler. GRLIB Open-Source VHDL IP Library. 2005. http://www.klabs.org/mapld04/abstracts/gaisler_a.pdf
    [24] Jiri Gaisler, Edvin Catovic. GRLIB IP Core User's Manual, Version 1.0.4. November 2005.
    [25] SPARC International Inc.The SPARC Architecture Manual Version 8.1991.
    [26] ARM Limited Inc. AMBA Specification (Rev 2.0). 1999. http://www.arm.com
    [27] Intel Corp. Intel IXP2800 Network Processor,. http://developer.intel.com/design/network/products/npfamily/ixp2800.htm. 2002.
    [28] Parekh A K, Gallagher R G A generalized processor sharing approach to flow control in integrated services networks: The single-node case [J]. IEEE/ ACM Transactions on Network2ing, 1993 ,1(3): 344—357.
    [29] Agere Inc. PayloadPlus Routing Switch Processor, Preliminary Product Brief, Lucent Technologies, Microelectronics Group, April 2000.
    [30] EZchip Corporation. EZchip Technologies, Network Processor Designs for Next-Generation Networking Equipment, White paper, December 1999.
    [31] Broadcom Corporation. Practical System Design and Debug considerations for Multiprocessing in the Embedded Environment. White paper, December 2002.
    [32] K. Keutzer, S. Malik, and A. R. Newton, From ASIC to ASIP: The Next Design Discontinuity, in IEEE International Conference on Computer Design, pp. 84-90, September 2002.
    [33]Sundar Iyer and Nick McKeown. Analysis of the Parallel Packet Switch Architecture IEEE/ACM Transactions on Networking,pp.314-324,April 2003.
    [34]NPF site.www.npforum.com
    [35]Streaming Interface(NPSI)(September 2002)http://www.npforum.org/techinfo/HWStreamingIA.pdf
    [36]Adrian Cosoroaba.Memory Options Explode for Network Processors COMMUNICATION SYSTEMS DESIGN,MAY 2002.www.CommsDesign.com.
    [37]Ramaswamy,R.,Weng,N.,and Wolf,T.Application analysis and resource mapping for heterogeneous network processor architectures.In Proc.of Third Workshop on NP-3,Feb,2004
    [38]NLANR PMA:Special Traces:Abilene-V http://pma.nlanr.net/Special/ipls5.html
    [39]Todd Austin et.al.SimpleScalar Tutorial.http://www.simplescalar.com/docs/simple_tutorial_v4.pdf

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700