基于细粒度加速单元灵活可配H.264视频解码子系统研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
随着集成电路制造工艺的不断进步和嵌入式应用的快速发展,SoC在嵌入式领域的应用日益广泛。H.264/AVC标准作为领先视频压缩技术,已成为多媒体嵌入式产品中不可或缺的一项功能。高效的H.264/AVC视频解码架构方案已经成为嵌入式SoC产品开发的困难和挑战之一。单独依靠嵌入式处理器完成高清H.264/AVC视频解码的纯软件方案由于能效比较低,难以满足嵌入式系统性能需求。通过集成H.264/AVC解码器IP核来硬件加速解码性能已成为当前SoC设计的主要方案,然而,这却对H.264/AVC解码器的灵活配置性和可扩展性提出了挑战。
     针对传统专用H.264/AVC解码器IP核可配置扩展性差和配置繁琐的问题,本文提出了一种基于细粒度加速单元灵活可配的H.264/AVC解码子系统架构。该架构不仅支持细粒度的各类运算硬件资源可配性,也能在SoC设计完成后由处理器根据视频特点动态配置和调用硬件资源,提高视频解码能效。为提高子系统的重用性,本文进一步采用IP-XACT标准对H.264/AVC解码子系统进行封装,使得子系统能够快速高效的集成到SoC芯片中。
     本文的研究内容包括以下四个方面:
     1)对H.264/AVC视频解器的软件算法进行研究,评估各种解码运算功能模块的复杂性,提出了由RISC处理器和细粒度加速单元模块组成的视频解码子系统软硬件体系架构。采用自底向上的设计方法,将解码应用根据功能类别划分为多个细粒度加速单元模块,通过流水线架构实现并发操作。根据视频特点,通过处理器对任务流架构(各加速单元之间的级联)进行软件灵活配置,实现解码数据的动态分配,进一步提高解码的高效并行性。
     2)分别设计子系统中各个细粒度加速单元模块,定制加速单元微架构与接口,在优化面积和功耗的同时,提高性能和数据吞吐量。基于细粒度加速单元模块,确定可扩展的解码子系统总线和存储器架构,满足加速单元存储器访问和加速单元间数据通信需求。
     3)基于CKSoC集成平台,采用IP-XACT标准封装子系统,增强子系统在不同SoC设计环境间的移植性,实现子系统配置和集成的高效化。根据视频编码的不同需求,灵活配置各加速单元数目、子系统存储大小和处理器类型达到性能与功耗面积之间的最佳折中。
     4)分析不同配置下H.264/AVC解码子系统的实验结果。采用CKSoC集成平台生成含有解码子系统的SoC样例,分别统计和分析各加速单元的性能。以各个加速单元性能结果为指导,调整H.264/AVC解码子系统配置,同时在软硬件协同工作下对解码图像进行数据划分优化,进一步比较和分析解码性能。实验证明,针对不同H.264视频流任务,该子系统架构在具有良好的可配置性和可扩展性同时,也能满足高效的实时解码需求。
With Rapid growth of silicon process technology and fast development of embedded application, System-on-Chip (SoC) is increasingly dominated embedded electronic market. Due to the coding efficiency of H.264/AVC video standard, it becomes the important feature for embedded system of multimedia application. Therefore, SoC designers have to provide an efficient H.264/AVC decode solution in the SoC targeting on multimedia application. It is still a challenge to embedded processor to decode H.264/AVC HD video stream. So most SoCs have to contain a H264/AVC decoder IP, which requests IP designers have to consider the IP reuse and configures during IP design. In this paper, we avoid the faults of traditional ASIC IP design method and propose a refined accelerator based H.264/AVC decoder sub-system. It supports scalable hardware configures according to the requirement of SoC. Moreover, the hardware resources of the sub-system could be scheduled by the processor according to the features of video stream. To improve reuse of the video sub-system, we also utilize IP-XACT standard to package the sub-system. The research work covers the following four aspects:
     1) Estimate the complexities of H.264/AVC decode tasks by analyzing the algorithm reference softare and then complete the software-hardware division. Adopting the bottom-to-up approach, the hardware part is refined into several accelerator modules. The accelerator modules are designed respectively by customing the micro-architecture, optimizing the power and area, increasing the throughput.
     2) Basing on the accelarators, design the bus and memory architecture of the H.264/AVC sub-system to meet the communication and memory access requirement. It supports feasible configure of accelerator number and function, memory size, processor type and so on. What is more, the sub-system could be controlled and scheduled in distribution. According to the features of video stream, the pipeline of tasks could be modified dynamically and the video data could be portioned and assigned to different accelerator in cooperation with processor.
     3) Basing on CKSoC integrated platform, package the sub-system using IP-XACT standard. It increases the reuabilty of sub-system as well as configuring automation.
     4) Anaylze the performance of H.264/AVC decode sub-system under different configures. Different SoC platforms containg H.264/AVC decode sub-sysytem are generated using CKSoC integrated Platform. Estimate and analyze the performance of each accelerator. According to the performance of each accelerator, re-configure the H.264/AVC decode sub-system. And then analyze and compare the video decoding performance with data patition optimization.
引文
[1]Tol, E.B.v.d., E.G.T. Jaspers, and R.H. Gelderblom. Mapping of H.264 decoding on a multiprocessor architecture. in Proceedings of the SPIE:Image and Video Communications and Processing.2003.
    [2]陈桂财,宋元征,王新,et.al一种采用AVS视频监控系统的设计与实现[J],中国图像图形学报A,14(8),2009.
    [3]Horowitz, M., et al., H.264/AVC baseline profile decoder complexity analysis. Circuits and Systems for Video Technology[J].IEEE Transactions on,2003.13(7): p.704-716.
    [4]T. Warsaw, C. Harris, N.Y. Rochester, M. et al. Architecture design of an H.264/AVC decoder for real-time FPGA implementation. In Conf. Application-specific systems, Architectures and Processor,2006. P.253-256.
    [5]T.W. Chen, Y.W. Huang, T.C. Chen, et al. Architecture design of H.264/AVC decoder with hybrid task pipeling for high definition vidoes, in Conf. IEEE international Synposium on Circuits and Systems,2005. P.2931-2934.
    [6]Iain E.G., H.264 and MPEG-4 Video Compresion-Video Coding for Next-generation Multimedia, Wiley Press.
    [7]Michael Igarta, A Study Of MPEG-2 and H.264 Video Coding, A Thesis Submitted to the Faculty of Purdue Univesity.
    [8]ITU-T Rec. H.264 and ISO/IEC 14486-10 AVC. Draft ITU-T recommendation and final draft international standard of joint video specification [S]. JVT-G050, 2003.
    [9]HORORWITZ M, JOCH A, KOSSENTINI F et al. h.264/AVC baseline profile decoder complexity analysis[J]. IEEE Transactions on Circuits and Systems for Video Technology,2003:704-716.
    [10]Open SystemC initiative. SystemC standard 2.2[OL]. (2007-3-14) [2010-4-11]. www.osci.org.
    [11]OCP-IP. Open Core Protocol[OL]. [2010-4-11]. www.ocpip.org
    [12]The SPRIRIT Consortium. SPIRIT 1.4 specification[OL]. (2008-3) [2010-4-11] http://www.spiritconsortium.org/home/.
    [13]Kamil Synek Sonic, Inc. Using SPIRIT Cores in SonicsStudio[C]. //System-on-Chip,2006. International Symposium on,2006-11:1-4
    [14]BERMAN V. STRIKE M. et al. Industrial Proving the SPIRIT Consortium Specifications for Design Chain Integration[C].//Design, Automation and Test in Europe,2006. DATE'06,2006-3:1-6.
    [15]杭州中天微系统有限公司.(2007-10-4) [2010-4-11]. C-SKY CKCore处理器用户手册[OL], www.c-sky.com.
    [16]Marr, D., et al., Hyper-Threading Technology microarchitecture and performance.2004.
    [17]Intel, Next Generation Intel Processor:Software Developers Guide.2004.
    [18]Pescador, F., et al. A Real-Time H.264 BP decoder based on a DM642 DSP. 2007.
    [19]李小红,et a1.,基于DSP的H.264关键模块技术的研究及实现.仪器仪表学报,2006.27(010):p.1330-1333.
    [20]K. Huang, S.I. Han, K. Popovici, et al. Simulink-Based MPSoC Design Flow: Case study of Motion-JPEG and H.264, in Conf. Design Automation Conference,2007. P.39-42
    [21]J.J. Zhang, R.J. Xiao, Z.Y. Yu, X.Y. Zeng, H.264 decoder on a 16-core processor with shared-memory and massage-passing communications, in Conf. IEEE 11th international Conference on Solid-State and Integrated Circuit Technology,2012. P.1-3
    [22]M. Alle, J. Biswas, S.K. Nandy, High Performance VLSI architecture Design for H.264 CAVLC Decoder, in Conf. Application-specific System, Architecture and Processors,2006, P.317-322
    [23]H.B. Yin, D.P. Zhang, X.M. Wang, Z.L. Xia, An efficient MV prediction VLSI architecture for H.264 video decoder, in Conf. Audio, Language and Image Processing,2008. P.423-428.
    [24]S. Zhao, C. Lu, X.F. Zhou, H. min, D. Zhou, VLSI design for de-blocking filter for H.264 decoder, in Conf.7th International Conference On AISC,2007. P. 786-789.
    [25]HUANG Y., CHEN T., HSIEH B. et al. Architecture design for deblocking filter in H.264/JVT/AVC.[C]//International Conference on Multimedia and Expo, 2003:1-693-6 vol.1.
    [26]李健,乔飞,罗嵘,等.无SRAM的H.264/AVC去块效应滤波器[J].电子与信息学报,2008,30(8):2012-2016
    [27]XU Ke, CHOY C. A Five-Stage Pipeline 204 Cycles/MB Singel-Port SRAM-Based Deblocking Filter for H.264/AVC[J]. IEEE Transactions on Circuits and Systems for Video Technology,2008:363-374
    [28]CHEN Qing, ZHENG Wei, FANG Jian et al. A pipelined hardware architecture of deblocking filter in H.264/AVC[C]//Third International Conference on Communications and Networking in China, Hangzhou:[s. n.],2008:815-819.
    [29]Hyunki, B., et al. Analysis and Parallelization of H.264 decoder on Cell Broadband Engine Architecture, in Signal Processing and Information Technology,2007 IEEE International Symposium on.2007.
    [30]CHO K. KIM J. et al. Reusable Platform Design Methodology for SoC Integration and Verification[C].//International SoC Design Conference,2008. ISOCC'08, International,2008-11:78-81.
    [31]MOLE G. et al. Philips Semiconductors Next Generation Architectural IP Reuse Developments for SoC Integration [OL].(2009-5) [2010-4-11] http://www.design-reuse.com/articles/
    [32]KRUIJTZER W. WOLF P. V. D. et al. Industrial IP Integration Flows based on IP-XACT Standards[C].//Design Automation and Test in Europe,2008. DATE'08,2008-3:32-37.
    [33]ZYS M. VAUMORIN E. et al. Straightforward IP Integration with IP-XACT RTL-TLM Switching [OL], (2008-6) [2010-4-11]. www.techonline.com/learning/techpaper.
    [34]IP-XACT USER GROUP. Using IP-XACT in Complex SoC I/O Integration and Register Management[C].//Design Automation Conference,2008. DAC08, 2008-6-10.
    [35]D. Ma, K. Huang, S.W. Xiu, et al. An Automatic SoC design Methodoloy For Integration and Verification, Advnaced Materails Research,2011. P.383-390.
    [36]Hennessy, J., et al., Computer architecture:a quantitative approach.2003: Morgan Kaufmann.
    [37]戴郁,李冬晓,郑伟,等.H.264/AVC运动补偿的高效插值结构设计[J].浙江大学学报(工学版),2009,43(2):255-260.
    [38]李春澍,黄凯,修思文,马德等.H.264/AVC子像素插值的高性能流水线设计与实现[J].浙江大学学报(工学版),2011,45(7):1187-1193
    [39]马德,黄凯,陈华锋,等.H.264去块效应滤波器的混合递增滤波流水线设计[J].浙江大学学报(工学版),2011,45(7):1206-1214
    [40]LIST P, JOCH A, LAINEMA J et al. Adaptive deblocking filter[J]. IEEE Transactions on Circuits and Systems for Video Technology,2003, Page(s):614-619.
    [41]Thomas Wiegand, Gary J.Sullivan, Gisle Bjontegaard, et al.Overview of H.264/AVC Video Coding Standard[J]. Circuits and Systems for Video Technology, IEEE Transactions on,2003,13(7):560-576.
    [42]Shi, B., Zheng, W., Lee, H.S., Li, X.D., Zhang, M.,2008. Pipelined Architecture Design of H.264/AVC CABAC Real-time Decoding.4Th IEEE International Conference on Circuit and Systems for Communications, Shanghai, China, p.492-496.
    [43]Yu, W., He, Y.,2005. A High Performance CABAC Decoding Architecture. IEEE Transaction on Consumer Electronics.51 (4):1352-1359.
    [44]Yuan, T.C,2008. A Novel pipeline Architecture for H.264/AVC CABAC Decoder. IEEE ASIA Pacific Conference on Circuit and Systems, Macao, China, p.208-311.
    [45]Wu Di, Gao When, Hu Mingzeng, Ji Zhenzhou, "A VLSI architecture design of CAVLC decoder", The 5th International Conference on ASIC, Oct.2003, pp.962-965
    [46]Power-efficient VLSI Implementation of BitStream Parsing in H.264/AVC decoder
    [47]Yao, C.Y., Chien, C.L., Hsui, C.C., Ching, L.S., Jiun, I.G.,2006. A high throughput VLSI architecture Design For h.264 con-text-based Adaptive binary arithmetic decoding with look ahead parsing. IEEE International Conference on Multimedia and Expo, Toronto, Ont. p.357-360.
    [48]Yao, C.Y., Jiun, I.G.,2009. High-Through H.264/AVC High-Profile CABAC Decoder for HDTV Application. IEEE Transaction on Circuits and Systems for Video Technology.19(9):1395-1399.
    [49]Chang, K.H., Lin, Y.L.,2009. A Very High Throughput Fully Hardwired CABAC Decoder. International Symposium on Intelligent Signal Processing and Communication Systems, Kanazawa, p.200-203.
    [50]林锋毅.基于资源重用的H.264预测补偿模块VLSI设计与研究,浙江大学硕士论文,2010.01.25
    [51]Ke, X. and C. Chiu-Sing, A Power-Efficient and Self-Adaptive Prediction Engine for H.264/AVC Decoding. Very Large Scale Integration (VLSI) Systems, IEEE Transactions on,2008.16(3):p.302-313.
    [52]YU S, RU C, LIN Y. A near optimal deblocking filter for H.264 advanced video coding[C]//Asia and South Pacific Conference on Design Automation, Yokohama:[s. n.],2006:6pp.
    [53]SHENG Bin, GAO Wen, WU Di. An implemented architecture of deblocking filter for H.264/AVC[C]//International Conference on Image Processing,2004: 665-668 Vol.1
    [54]Kuo, M.Y., Li, Y., Lee C.Y.,2011. An Area-efficient high-accuracy Prediciton-based CABAC decoder Architecture for H.264/AVC. IEEE International Symposium on Circuit and Systems, Rio de Janeiro, p.160-163.
    [55]Liao, Y. H., Li, G. L., Chang T. S.,2012. A Highly Efficient VLSI Architecture for H.264/AVC Level 5.1 CABAC Decoder. IEEE Transactions on Circuits and Systems for Video Technology,22(2):272-281.
    [56]Chen, J.W., Lin, Y.L.,2007. A High-performance hardwired CABAC decoder. IEEE International Conference on Acoustics, Speech and Signal Processing, Honolulu, HI. p37-40.
    [57]La-Gou, W., et al. A 4x4 pipelined intra frame decoder for H.264. in Anti-counterfeiting, Security, and Identification in Communication,2009. ASID 2009.3rd International Conference on.2009.
    [58]LEI Yu, LI Hui, Huang Kai, et al. A H.264 video decoder with scheme of efficient bandwidth optimization for motion compensation [C]//International Symposium on Communications and Information Technologies. Sydney, Australia:IEEE,2007,10:531-534.
    [59]YANG Kun, ZHANG Chun, DU Guo-ze, et al. A hardware-software co-design for H.264/AVC decoder [C]//Asia Solid-State Circuit Conference. Hangzhou, China:IEEE,2006,10:119-122.
    [60]MIN K, CHONG J. A Memory and Performance Optimized Architecture of Deblocking Filter in H.264/AVC[C]//International Conference on Multimedia and Ubiquitous Engineering, Seoul:[s. n.],2007:220-225

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700