Buffer structure optimized VLSI architecture for efficient hierarchical integer pixel motion estimation implementation
详细信息    查看全文
  • 作者:Haibing Yin ; Dong Sun Park ; Xiao Yun Zhang
  • 关键词:Motion estimation ; VLSI architecture ; Data organization ; Buffer structure
  • 刊名:Journal of Real-Time Image Processing
  • 出版年:2016
  • 出版时间:March 2016
  • 年:2016
  • 卷:11
  • 期:3
  • 页码:507-525
  • 全文大小:2,526 KB
  • 参考文献:1.ITU-T Recommendation and International Standard of Joint Video Specification. ITU-T Rec. H.264/ISO/IEC 14496-10 AVC, Mar.2005
    2.SMPTE: 421M, VC-1 Compressed video bitstream format and decoding process. http://​www.​smpte.​org/​smpte_​store/​standards/​pdf/​s421m.​pdf
    3.Huang, Y.-W., et al.: A 1.3 TOPS H.264/AVC single-chip encoder for HDTV applications. In: IEEE ISSCC Digest Technical Papers, pp. 128–129 (2005)
    4.Chang, H.C., et al.: A 7mW-to-183mW dynamic quality-scalable H.264 video encoder chip, ISSCC Digest Technical Papers, pp. 280–281 (2007)
    5.Liu, Z., Song, Y., Shao, M., Li, S., Li, L., Ishiwata, S., Nakagawa, M., Goto, S., Ikenaga, T.: A 1.41W H.264/AVC real-time encoder SOC for HDTV1080P. In: VLSI Circuits Symposium of Digest, pp. 12–13 (2007)
    6.Lin, Y.-K., et al.: A 242mW 10mm2 1080P H.264/AVC High-Profile Encoder Chip, ISSCC Digest Technical Paper, pp. 314–615, (2008)
    7.Chen, Y.-H., Chuang, T.-D., Chen, Y.-J., Li, C.-T., Hsu, C.-J., Chien, S.-Y., Chen, L.G.: An H.264/AVC scalable extension and high profile HDTV 1080p encoder chip, 2008 Symposium on VLSI Circuits Digest of Technical Papers, pp. 104105 (2008)
    8.Chen, T.-C., et al.: 2.8 to 67.2 mW low-power and power-aware H.264 encoder for mobile applications. In: VLSI Circuits Symposium Digest, pp. 222–223 (2007)
    9.Iwata, K., Mochizuki, S., Kimura, M., et al.: A 256 mW 40 Mbps full-HD H.264 high-profile codec featuring a dual-macroblock pipeline architecture in 65 nm CMOS, IEEE J. Solid-State Circuits. 44(4), 1184–1191 (2009)
    10.Ding, L.-F., Chen, W.-Y., Tsung, P.-K., Chen, T.-C., Lin, P.-C., Chang, C.-Y., Chen, W.-L., Chen, L.-G.: A 212 MPixels/s 4096 × 2160p multi-view video encoder chip for 3D/quad HDTV applications. In: IEEE ISSCC Digest Technical Papers (2009)
    11.Matsui, H, Ogawa, T, et al.: An H.264 full HD 60i double speed encoder IP supporting both MBAFF and field-pic structure. International Symposium on VLSI Design, Automation and Test (VLSI-DAT), Taiwan (2011)
    12.Nittam, K., Ikeda, M.: An H.264/AVC high422 profile and MPEG-2 422 profile encoder LSI for HDTV broadcasting infrastructures, International Symposium on VLSI Circuits (2008)
    13.Yin, H.B., Qi, H.G., Jia, H., Xie, D., Gao, W.: Efficient macroblock pipeline structure in high definition AVS video encoder VLSI architecture, 2010 IEEE International Symposium on Circuits and Systems (ISCAS 2010) Paris, France, 30 May–2 June 2010
    14.Huang, Y.-W., Chen, C.-Y., et al.: Survey on block matching motion estimation algorithms and architectures with new results. J. VLSI Signal Process. 42, 297–320 (2006)CrossRef MATH
    15.Chen, C.-Y., Chien, S.-Y., Huang, Y.-W., Chen, T.-C., Wang, T.-C., Chen, L.-G.: Analysis and architecture design of variable block-size motion estimation for H.264/AVC. IEEE Trans. Circuits Syst. I 53(3), 578–593 (2006)CrossRef
    16.Chang, H.-C., Chen, J.-W., Wu, B.-T., Su, C.-L., Wang, J.-S., Guo, J.-I.: A dynamic quality-adjustable H.264 video encoder for power-aware video applications. IEEE Trans. Circuits Syst. Video Tech. 19(12), 1739–1754 (2009)CrossRef
    17.Liu, Z., Song, Y., Shao, M., Li, S., Li, L., Goto, S., Ikenaga, T.: 32-parallel SAD tree hardwired engine for variable block size motion estimation in HDTV1080P real-time encoding application. In: Proceeding of IEEE Workshop Signal Processing System, pp. 675–680 (2007)
    18.Lin, Y.-K., Lin, C.-C., Kuo, T.-Y., Chang, T.-S.: A hardware-efficient H.264/AVC motion-estimation design for high-definition video. IEEE Trans. Circuits Syst. I Regul. Pap. 55(6), 1526–1535 (2008)
    19.Chen, Y.-H., Chen, T.-C., Tsai, C.-Y., Tsai, S.-F., Chen, L.-G.: Algorithm and architecture design of power-oriented H264/AVC baseline profile encoder for portable devices. IEEE Trans. Circuits Syst. Video Tech. 19(8), 1118–1128 (2009)MathSciNet CrossRef
    20.Ding, L.-F., Chen, W.-Y., Tsung, P.-K., et al.: A 212 MPixels/s 4096 2160p multiview video encoder chip for 3D/quad full HDTV applications. IEEE J. Solid-State Circuits 45(1), 46–58 (2010)CrossRef
    21.Yin, H., Jia, H., Qi, H., Ji, X., Xie, X., Gao, W.: A Hardware-efficient multi-resolution block matching algorithm and its VLSI architecture for high definition MPEG-like video encoders. IEEE Trans. Circuits Syst. Video Technol. 20(9), 1242–1254 (2010). (2010)CrossRef
    22.Tsai, T.-H., Pan, Y.-N.: High efficiency architecture design of real-time QFHD for H.264/AVC fast block motion estimation. IEEE Trans. Circuits Syst. Video Technol. 21(11), 1646–1658 (2011)CrossRef
    23.Wen, X., OC, Au, Xu, J., Fang, L., Cha, R., Li, J.: Novel RD-optimized VBSME with matching highly data re-usable hardware architecture. IEEE Trans. Circuits Syst. Video Technol. 21(2), 206–219 (2011). (2011)CrossRef
    24.Kim, J., Park, T.: A novel VLSI architecture for full-search variable block-size motion estimation. IEEE Trans. Consumer Electron. I 55(2), 728–733 (2009)CrossRef
    25.Lee, J.H., Lee, N.S.: Variable block size motion estimation algorithm and its hardware architecture for H.264/AVC. Proc. IEEE Int. Symp. Circuits Syst. 3, 741–744 (2004)
    26.Lin, H.D., Anesko, A., Petryna, B.: A 14-GOPS programmable motion estimator for H.26 × videocoding. IEEE J. Solid-State Circuits 31(11), 1742–1750 (1996)CrossRef
    27.Cheng, S.C., Hang, H.M., et al.: A comparison of block matching algorithms mapped to systolic-array implementation. IEEE Trans. Circuits Syst. Video Technol. 7(5), 741–757 (1997)CrossRef
    28.Vanne, J., Aho, E., Kuusilinna, K., Hämäläinen, T.D.: A configurable motion estimation architecture for block-matching algorithms. IEEE Trans. Circuits Syst. Video Technol. 19(4), 74–86 (2009)CrossRef
    29.Song, B.C., et al.: Multi-resolution block matching algorithm and its VLSI architecture for fast motion estimation in a MPEG-2 video encoder. IEEE Trans. CSVT 14(9), 1119–1137 (2004)
    30.Tuan, J.-C., Chang, T.-S., Jen, C.-W.: On the data reuse and memory bandwidth analysis for full-search block-matching VLSI architecture. IEEE Trans. Circuits Syst. Video Technol. 12(1), 61–72 (2002)
    31.Chen, C.-Y., Huang, C.-T., Chen, Y.-H., Chen, L.-G.: Level C+ data reuse scheme for motion estimation with corresponding coding orders. IEEE Trans. Circuits Syst. Video Technol. 16(4), 553–558 (2006)
    32.Chen, Z., Zhou, P., He, Y., Wang, G.: Fast motion estimation for JVT JVT-G016 (2003)
    33.Calhoun, B.H., Cao, Y., Li, X., Mai, K., Pileggi, L.T., Rutenbar, R.A., Shepard, K.L.: Digital circuit design challenges and opportunities in the era of nanoscale CMOS. Proc. IEEE 96(2), 343–365 (2008)CrossRef
    34.Bjøntegaard, G.: Calculation of average PSNR differences between RD curves. document VCEG-M33 of ITU-T Q6/16, Austin TX, USA (2001)
  • 作者单位:Haibing Yin (1)
    Dong Sun Park (2)
    Xiao Yun Zhang (3)

    1. School of Information Engineering, China Jiliang University, Hangzhou, China
    2. Division of Electronics and Information Engineering, Chonbuk National University, Jeonju, Jeonbuk, Korea
    3. Institute of Image Communication and Signal Processing, Shanghai Jiaotong University, Shanghai, China
  • 刊物类别:Computer Science
  • 刊物主题:Image Processing and Computer Vision
    Multimedia Information Systems
    Computer Graphics
    Pattern Recognition
    Signal,Image and Speech Processing
  • 出版者:Springer Berlin / Heidelberg
  • ISSN:1861-8219
文摘
Integer pixel motion estimation (IME) is one crucial module with high complexity in high-definition video encoder. Efficient algorithm and architecture joint design is supposed to tradeoff multiple target parameters including throughput capacity, logic gate, on-chip SRAM size, memory bandwidth, and rate distortion performance. Data organization and on-chip buffer structure are crucial factors for IME architecture design, accounting for multiple target performance tradeoff. In this work, we combine global hierarchical search and local full search to propose hardware efficient IME algorithm, and then propose hardware VLSI architecture with optimized on-chip buffer structure. The major contribution of this work is characterized by: (1) improved hierarchical IME algorithm with presearch and deliberate data organization, (2) multistage on-chip reference pixel buffer structure with high data reuse between integer and fraction pixel motion estimations, (3) highly reused and reconfigurable processing element structure. The optimized data organization and buffer structure achieves nearly 70 % buffer saving with less than average 0.08, 0.12 dB the worst case, PSNR degradation compared with full search based architecture. At the hardware cost of 336 and 382 K logic gate and 20 kB SRAM, the proposed architecture achieves the throughput of 384 and 272 cycles per macroblock, at system frequency of 95 and 264 MHz for 1080p and QFHD @30fps format video coding.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700