专用处理器及片上通信架构设计研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
以视频信息为主的多媒体技术是21世纪最具时代特征和最富有活力的研究与应用领域,同时也是高性能片上系统(System on a Chip,SoC)发挥核心作用的领域。专用处理器及专用SoC可以在通用处理器的可编程、灵活性与专用集成电路(Application-Specific Integrated Circuit,ASIC)的高性能、低功耗之间取得设计折衷,已成为目前的研究热点。本文以最新视频编解码标准H.264/AVC为目标应用,就高性能专用SoC设计中的两个关键技术:专用指令集处理器(Application-Specific InstructionSet Processor,ASIP)设计与专用片上通信架构设计展开论述。
     处理器是SoC的核心,承担着系统大部分的运算与控制任务。本文介绍了一款自主研发的面向视频压缩应用的ASIP指令体系及硬件实现。针对视频应用中数据组织操作开销大、内存读写频繁且地址不连续等特点,该指令集采用显式数据组织的指令格式,将数据组织操作内嵌到指令编码中,同时设计了行列交织的内存读写模式。硬件架构为SIMD(Single Instruction Multiple Data)与VLIW(Very Long Instruction Word)的混合结构,采用了RISC(Reduced Instrction Set Computer)类型的流水线结构,其中分布式运算单元支持可变字长的并行计算,而取指单元可完成变长编码指令的取指及下一PC(Program Counter)计算等功能,并设计了指令缓冲区以减少存储器访问。
     专用指令的设计空间庞大,手工设计效率不高。本文在专用指令自动提取方面进行了探索,提出了静态数据流图搜索与动态结果筛选相结合的方法,并作为一种重要的补充应用到前述的ASIP指令集设计中。在此基础上,以H.264/AVC的环路去块效应滤波算法为例,叙述了视频压缩算法核心在ASIP上的优化过程。
     片上集成部件的增多导致部件之间的数据通信逐渐成为制约系统性能的瓶颈。以目标应用的具体数据通信信息来指导通信架构的设计,可缩短通信消耗时间,提高系统实时性能。为此需要在设计早期对于目标应用中的数据通信部分进行精确的高层建模与仿真。本文在事务级模型(Transaction Level Model,TLM)的基础上提出了一种新的抽象模型CEAM(communication Event Accurate Model)用于系统的通信建模,完成了H.264/AVC的通信建模与仿真。并以此为基础,设计了一种针对应用优化的总线调度策略。实验结果表明,该策略可极大提高总线的利用效率,缩短通信任务完成时间。
High-performance SoC (System on a Chip) have been playing an important role in multimedia applications especially the video compression which is everywhere in human beings' daily life. Application-specific SoCs (ASSoC) can achieve a good tradeoff between the flexibility of general-purpose computing platform and the performance and the efficiency of ASICs (Application-Specific Integrated Circuits). It is now the hotspot of research both in academy and industry. This thesis mainly focuses on design of Application-specific Instruction Set Processors (ASIP) and on-chip communication architectures, which both are critical technique in ASSoCs.
     As the core component of AS-SoC, ASIPs provide high computation performance while maintaining the flexibility as a programmable device. A video-oriented ASIP design is introduced in this dissertation. It utilizes the novel hybrid SIMD (Single Instruction Multiple Data) and VLIW (Very Long Instruction Word) architecture with the EDO (Explicit Data Organization) enhancement. Data permutation and re-organization functions are explicitly designated in the instruction encoding instead of extra permutation instructions which can significantly reduce the code size. Design of split ALU (Arighmatic Logic Unit) and the IFU (Instruction Fetch Unit) which supports variable-length instruction encoding are introduced. An instruction buffer is embedded in IFU that can significantly reduce the main instruction memory access.
     An application-specific instruction synthesis approach is also proposed in this dissertation. The application is first converted to directed data flow graph and a search algorithm is then applied on it to extract optimized instructions. This approach has been applied in the design of the mentioned ASIP. The instruction and data path optimization targeted at the in-loop filter of H.264/AVC is elaborated with the modification of the reference code for the improvement of parallelism.
     As more and more components are integrated into single chip, the data communication between components is now in the critical path of AS-SoC design. To model the communication of the target application accurately, a new abstract level called CEAM (Communication Event Accurate Model) is proposed and the data communication of the H.264/AVC has been modeled and simulated in CEAM. The profiling data is gathered during the simulation and an application-specific bus scheduling scheme is designed based on these data. The results of experiments show that the proposed scheduling scheme has a significant performance improvement against the general-purpose schemes such as RR (Round-Robin) an FP (Fixed Priority).
引文
[1]K.Keutzer,S.Malik,and A.R.Newton,"From ASIC to ASIP:The Next Design Discontinuity," in Proceedings of the 2002 IEEE International Conference on Computer Design:VLSI in Computers and Processors,2002,pp.84-90.
    [2]C.Rowen,Engineering the Complex SOC:Fast,Flexible Design with Configurable Processors:Pearson Education,2004.
    [3]"http://public.itrs.net/Files/20011TRS/Design.pdf," International Technology Roadmap for Semiconductors(ITRS),2001.
    [4]M.F.Jacome and G.d.Veciana,"Design Challenges for New Application-Specific Processors," IEEE Design & Test of Computers,vol.17,pp.40-50,2000.
    [5]M.K.Jain,M.Balakdshnan,and A.Kumar,"ASIP Design Methodologies:Survey and Issues," in Fourteenth International Conference on VLSI Design,2001,pp.76-81.
    [6]S.D.Kim,e.H.Lee,C.J.Hyun,and M.H.Sunwoo,"ASIP approach for implementation of H.264/AVC," in Asia and South Pacific Conference on Design Automation,2006,p.7.
    [7]G.Kappen and T.G.Noll,"Application specific instruction processor based implementation of a GNSS receiver on an FPGA," in Proceedings Design,Automation and Test in Europe,2006.DATE '06.,2006,p.6.
    [8]S.Momcilovic,T.Dias,N.Roma,and L.Sousa,"Application Specific Instruction Set Processor for Adaptive Video Motion Estimation," in 9th EUROMICRO Conference on Digital System Design:Architectures,Methods and Tools,2006.,2006,pp.160-167.
    [9]K.Puusaari,"Application Specific Instruction Set Processor Microarchitecture for UTMS-FDD Cell Search," in International Symposium on System-on-Chip,2005.Proceedings.,2005,pp.46-49.
    [10]T.Glokler and H.Meyr,Design of Energy-Efficient Application-Specific Instruction Set Processors.New York,USA:Kluwer Academic,2004.
    [11]M.E.Kreutz,L.Carro,C.A.Zeferino,and A.A.Susin,"Communication architectures for system-on-chip," in 14th symposium on integrated Circuits and Systems Design,Pirenopolis,Brazil,2001,pp.14-19.
    [12]ARM,"The AMBA Specification Rev2.0," http://www..arm.com/products/slutions/AMBA_Spec.html.
    [13]W.Wolf,High-Performance Embedded Computing:Architectures,Applications,and Methodologies.San Francisco,CA,USA:Morgan Kaufmann Publishers,2007.
    [14]M.Loghi,F.Angiolini,D.Bertozzi,and L.Benini,"Analyzing On-Chip Communication in a MPSoC Environment," in Proceedings of the Design,Automation and Test in Europe Conference and Exhibition,2004,pp.752-757.
    [15]S.Furber and J.Bainbridge,"Future Trends in SoC Interconnect," in International Symposium on System-on-Chip,2005,pp.183-186.
    [16]F.Ghenassia,Transaction Level Modeling with SystemC.Dordrecht,The Netherlands:Springer,2005.
    [17]G.G.Lee,M.-J.Wang,H.-Y.Lin,D.W.-C.Su,and B.-Y.Lin,"Algorithm/Architecture Co-Design of 3-D Spatio - Temporal Motion Estimation for Video Coding," IEEE Transactions on Multimedia,vol.9,pp.455-465,2007.
    [18]K.Keutzer,S.Malik,A.R.Newton,J.M.Rabaey,and A.Sangiovanni-Vincentell,"System Level Design:Orthogonalization of Concerns and Platform-Based Design," IEEE Transactions on CAD,vol.19,pp.1523-1543,2000.
    [19]I.E.G.Richardson,H.264 and MPEG-4 Video Compression:Video Coding for Next-generation Multimedia:John Wiley & Sons,2003.
    [20]"ISO/IEC 14496-10,ITU-T Rec,H.264,joint video specification," ITU-T,2002.
    [21]M.Horowitz,A.Joch,F.Kossentini,and A.Hallapuro,"H.264/AVC Baseline Profile Decoder Complexity Analysis," IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS FOR VIDEO TECHNOLOGY,vol.13,pp.704-716,2003.
    [22]I.KURODA and T.NISHITANI,"Multimedia Processors," Proceedings of the IEEE,vol.86,pp.1203-1221,1998.
    [23]C.Lee,M.Potkonjak,and W.H.Mangione-Smith,"MediaBench:A Tool for Evaluating and Synthesizing Video Communications Systems," in Proceedings of the 30th Annual International Symposium on Microarchitecture,1997.
    [24]J.E.Fritts,F.W.Sterling,and J.A.Tucek,"MediaBench Ⅱ Video:Expediting the Next Generation of Video Systems Research," SPIE Electronic Imaging - Embedded Processors for Multimedia and Communications Ⅱ,pp.79-93,2005.
    [25]P.P.Chang,S.A.Mahlke,W.Y.Chen,N.J.Warter,and W.W.Hwu,"IMPACT:an architectural framework for multiple-instruction-issue processors," in Proceedings of the 18th Annual International Symposium on Computer Architecture,Toronto,Canada,1991,pp.266-275.
    [26]A.P.a.U.Weiser,"MMX technology extension to the Intel architecture," IEEE Micro,vol.16,pp.10-20,August 1996.
    [27]"MIPS Extension for Digital Media with 3D," White Paper,www.mips.com,MIPS Technologies,Inc.
    [28]G.F.S.Obeman,and F.Weber,"AMD 3Dnow! technology:architecture and implementations," IEEE Micro,vol.19,pp.37-48,April 1999.
    [29]T.Shreekant and T.Huff,"Implementing streaming SIMD extensions on the Pentium Ⅲ processor,"IEEE Micro,vol.20,pp.47-57,2000.
    [30]P.K.D.K.Diefendorff,R.Hochsprung,and H.Scales "AltiVec extension to PowerPC accelerates media processing" IEEE Micro vol.20 pp.85-95,April 2000.
    [31]S.Welzel,"Intel's Multimedia-ISA Extensions:MMX,SSE,SSE2,SSE3 with Coding Examples,"www.binenet,de.
    [32]P.K.D.T.M.Conte,M.D.Jennings,R.B Lee,A.Peleg,S.Rathnam,M.Schlansker,P.Song,A.Wolfe,"Challenges to Combining General-Purpose and Multimedia Processors," IEEE Computer,vol.30,pp.33-37,Dec.1997.
    [33]P.Rubinfeld,B.Rose,and M.McCallig,"Motion Video Instruction Extensions for Alpha,"http://www.digital,com/alphaoem/papers/pmvi-abstract.htm,2000.
    [34]S.A.Mahlke,R.E.Hank,J.E.McCormick,D.I.August,and W.-M.W.Hwu,"A comparison of full and partial predicated execution support for ILP processors," in Proceedings of the 22nd annual international symposium on Computer architecture,S.Margherita Ligure,Italy,1995.
    [35]N.Slingerland and A.J.Smith,"Measuring the Performance of Multimedia Instruction Sets," IEEE TRANSACTIONS ON COMPUTERS,vol.51,pp.1317-1332,2002.
    [36]M.J.Flynn,"Some Computer Organizations and Their Effectiveness," IEEE Transactions on Computers,vol.21,pp.948-960,1972.
    [37]J.L.Hennessy and D.A.Patterson,Computer Architecture:A Quantitative Approach:Morgan Kaufrnann,2003.
    [38]R.B.LEE and A.M.FISKIRAN,"PLX:An Instruction Set Architecture and Testbed for Multimedia 96 Information Processing," Journal of VLSI Signal Processing,vol.40,pp.85-108,2005.
    [39]J.W.v.d.Waerdt,"The TM3270 Media-processor," in Computer Engineering.vol.Doctor of Phlosophy:Delft University of Technology,2006,p.183.
    [40]E.Salami and M.Valero,"A Vector-uSIMD-VLIW Architecutre for Multimedia Applications," in International Conference on Parallel Processing,2005,pp.69-77.
    [41]T.M.Conte,S.Banerjia,S.Y.Larin,and K.N.Menezes,"Instruction Fetch Mechanisms for VLIW Architectures with Compressed Encodings," in Proceedings of the 29th Annual Symposium on Microarchitecture,Paris,France,1996,pp.201-211.
    [42]S.Pasricha,N.Dutt,and M.Ben-Romdhane,"Extending the Transaction Level Modeling Approach for Fast Communication Architecture Exploration," in International Conference on Hardware/Software Codesign and System Synthesis,2004,pp.242-247.
    [43]J.Fenlason and R.Stallman,"GNU gprof,the GNU profiler,"http://www.gnu.org/sofiware/binutils/manuai/gprof-2.9.1/html_mono/gprof.html,1998.
    [44]SimpleScalarARM,www.simplescalar.com,2007.
    [45]B.Stroustrup,The C++ PROGRAMMING LANGUAGE:Special Edition:Pearson Education,2001.
    [46]F.Sun,S.Ravi,A.Rangunathan,and n.K.Jha,"Custom-instruction synthesis for extensible processor platforms," IEEE Transactions on CAD/ICAS,vol.23,pp.216-228,2004.
    [47]K.Atasu,L.Pozzi,and P.lenne,"Automatic application-specific instruction-set extensions under microarchitectural constraints," in Design Automation Conference,Anaheim,California,USA,2003,pp.256-261.
    [48]H.Choi,I.-C.Park,S.H.Hwang,and C.-M.Kyung,"Synthesis of application specific instructions for embedded dsp software," in IEEE/ACM International Conference on Computer-Aided Design,San Jose,California,1998,pp.665-671.
    [49]Tensilica,"Xpres Product Brief," http://www.tensilica.com,2004.
    [50]K.Karuri,M.A.A.Faruque,S.Kraemer,R.Leupers,G.Ascheid,and H.Meyr,"Fine-grained application source code profiling for asip design," in Design Automation Conference,Anaheim,California,2005,pp.329-334.
    [51]"Lance Compiler," www.lancecompiler.com,2006.
    [52]J.Siek,L.-Q.Lee,and A.Lumsdaine,The Boost Graph Library:User Guide and Reference Mannual.Upper Saddle River,NJ,USA:Addison-Wesley,2001.
    [53]T.H.Cormen,C.E.Leiserson,R.L.Rivest,and C.Stein,Introduction to Algorithms.London,England:MIT Press,2001.
    [54]K.J.Liu,Q.Xing,and X.L.Yan,"A SIMD Video Signal Processor with Efficient Data Organization,"in IEEE Asian Solid-State Circuits Conference,Hangzhou,China,2006,pp.115-118.
    [55]P.List,A.Joch,and J.Lainema,"Adaptive deblocking filter," IEEE Transactions on Circuits and Systems for Video Technology,vol.13,pp.614-619,2003.
    [56]"x264-a free H.264/AVC encoder," http://developers.videolan.org/x264.html,2006.
    [57]"Intel architecture software developer's manual,volume2,"http://www.cs.inf.ethz.ch/stricker/lab/doc/intel-manual.vol2.pdf.
    [58]S.Pasricha,N.D.Dutt,and M.Ben-Romdhane,"BMSYN:Bus Matrix Communication Architecture Synthesis for MPSoC," IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS,vol.26,pp.1454-1464,2007.
    [59]K.Lahiri,A.Raghunathan,and S.Dey,"System-Level Performance Analysis for Designing On-Chip Communication Architectures," IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS,vol.20,pp.768-783,2001.
    [60]IBM,"The CoreConnect Bus Architecture,"http://www-306.ibm.com/chips/techlib/techlib.nsf/productfamilies/CoreConnect_Bus_Architecture.
    [61]Silicore,"WISHBONE System-on-Chip(SoC) Interconnection Architecture for Portable IP Cores,"http://opencores.org/wishbone/doc/specs/wbspec_b 3.pdf.
    [62]M.Caldari,M.Conti,M.Coppola,S.Curaba,L.Pieralisi,and C.Turchetti,"Transaction-Level Models for AMBA Bus Architecture Using SystemC 2.0," in Proceedings of the Design,Automation and Test in Europe Conference and Exhibition(DATE'03),2003,pp.26-31.
    [63]S.Pasricha,N.Dutt,and M.Ben-Romdhane,"Using TLM for Exploring Bus-based SoC Communication Architectures," in Proceedings of the16th International Conference on Application-Specific Systems,Architecture and Processors,2005,pp.79-85.
    [64]L.Friebe,H.-J.Stolberg,M.BerekoviC,and S.Moch,"HiBRID-SoC:a system-on-chip architecture with two multimedia DSPs and a RISC core," in SOC Conference,2003.Proceedings.IEEE International[Systems-on-Chip],2003,pp.17-20.
    [65]K.Youngsoo and W.Edmonson,"H.264 Video Decoder Design:Beyond RTL Design Implementation,"in IEEE Workshop on Signal Processing Systems Design and Implementation,2006.
    [66]谢晶and贾克斌,”一种基于4×4子块特征的H.264/AVC快速帧内预测算法,”电路与系统学报,vol.12,pp.11-14,2007.
    [67]Y.Yang,J.Yang,and Q.Xing,"A RISC/DSP dual-core platform for portable media applications," in International Conference on Solid-State and Integrated Circuit Technology,Shanghai,China,2006,pp.1795-1799.
    [68]S.M.Miao,Y.H.Zhou,and W.Zhang,"An Efficient Architecture for Adaptive Deblocking Filter of H.264/AVC Video Coding," IEEE Transactions on Consumer Electronics,vol.50,pp.292-296,2004.
    [69]A.Pimentel and C.Erbas,"An IDF-based trace transformation method for communication refinement,"in Design Automation Conference,2003,pp.402-407.
    [70]G.Kahn,"The semantics of a simple language for parallel programming," in Proc.of the IFIP Congress 74,1974.
    [71]David.C.Black,Systemc:from the ground up:Kluwer Academic,2004.
    [72]H.Hwang and T.Oh,"Conversion of Reference C Code to Dataflow Model:H.264 Encoder Case Study," in Asia and South Pacific Conference on Design Automation,2006,p.6.
    [73]T.Meyerowitz,C.Pinello,and A.Vincentelli,"A Tool for Describing and Evaluating Hierarchical Real-Time Bus Scheduling Policies," in Design Automation Conference,Anaheim,California,USA,2003,pp.256-261.
    [74]E.Bini and G.C.Buttazzo,"Schedulability Analysis of Periodic Fixed Priority Systems," IEEE TRANSACTIONS ON COMPUTERS,vol.53,pp.1462-1473,2004.
    [75]K.Lahiri,A.Raghunathan,and G.Lakshminarayana,"The LOTTERYBUS On-Chip Communication Architecture," IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION(VLSI) SYSTEMS,vol.14,pp.596-608,2006.
    [76]K.M.Chen,P.Liu,W.D.Wang,and Q.D.Yao,"Scheme to optimize real-time bus scheduling in multiprocessor SoC for media processing," Journal of Zhejiang University(Engineering Science),voi.41,pp.1546-1551,2007.
    [77]M.Jun,K.Bang,H.-J.Lee,N.Chang,and E.-Y.Chung,"Slack-based Bus Arbitration Scheme for Soft Real-time Constrained Embedded Systems," in Asia and South Pacific Design Automation Conference,Yokohame,Japan,2007,pp.159-164.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700