基于多维矢量矩阵的MVC研究

英文题名：Research on Multi-view Video Coding Based on Multi-dimensional Vector Matrix
作者：穆森
论文级别：硕士
学科专业名称：电子与通信工程
中文关键词：多视角视频编码 ; 多维矢量矩阵 ; DCT变换 ; 多维量化 ; 多维扫描
英文关键词：multi-view video coding ; multi-dimensional vector matrix ; DCT transform ; multi-dimensional quantization ; multi-dimensional scanning
学位年度：2012
导师：桑爱军
学科代码：081001
学位授予单位：吉林大学
论文提交日期：2012-06-01

摘要

随着信息技术的飞速发展，多媒体通信日益成为人们生活中不可或缺的一部分。特别是近年来，3D立体视频的发展更受到人们关注。从电影院的立体电影，到家用3D电视，立体视频无疑带给了我们一种全新的、更加真实的视觉感受。
     常见的3D立体视频，是模拟人的两只眼睛，通过两台摄像机以不同角度同时拍摄同一物体，回放阶段，则是分别将两台摄像机拍到的画面分给两只眼睛观看，在头脑中形成立体感。为了达到更强烈的立体感，有时会采用更多的摄像头进行拍摄。这便是我们所说的多视角视频。
     谈及视频技术的发展，就必然与视频编码技术联系起来。因为视频信息往往带有巨大的数据量，这为视频通信及其传播造成巨大障碍。传统的视频编码技术已经比较成熟，其研究已达到性能瓶颈，然而多视角视频编码(MVC)的相关研究仍处于初级阶段，许多理论有待完善。
     因此，本实验室提出了具有创新意义的多维矢量矩阵理论，打破了传统二维模型的局限，将矩阵的概念推广到了多维。在此基础上提出了基于离散余弦变换(DCT)的多维矢量DCT正交变换核矩阵，同时验证了它的正交性。在先前的研究中，实现了基于这一理论的彩色视频流压缩编码。考虑到利用这一多维模型来处理多视角视频时，可有效去除各个视角间较强的相关性，因此，本文的目的，即求实现基于多维矢量矩阵理论的多视角视频压缩编码，并已经取得一定成果。
     本文研究中，首先对多视角视频源数据进行分块处理，以便于后期DCT正交变换；考虑到各个视角间的相关性，将分块后的数据按视角顺序进行重新组合。重组后的数据构成了参与DCT变换的原始数据。
     依据多维矢量DCT正交变换核矩阵的数学模型，对重组后的数据进行多维矢量DCT正交变换，变换后的系数具有良好的集中效果。针对系数分布规律，采用较为灵活的多维量化，丢掉对视频影像不大的信息。最后进行差分编码、多维扫描及行程编码。解码的过程即为整个编码过程的反过程。
     最后，在VC++6.0环境下实现了本文方案的编程，探讨了不同量化条件下压缩性能，与先前研究成果的经验相比较，验证了本文方案的有效性与可行性。基于多维矢量矩阵理论的多视角视频压缩编码研究仍处于起步阶段，方案中很多步骤的具体编码方法仍有很大改进空间，希望本文的研究能为将来进一步研究打好基础。
With the rapid development of the information technology,multimediacommunication is increasingly becoming an indispensable part of humanlife.Especially in recent years,the development of3D stereoscopic video has drawnmore attention.From the3D stereoscopic movies in cinema to the3DTV for homeuse,stereoscopic video undoubtedly brings a brand-new,more realistic visualexperience.
     Common3D stereoscopic video,is to simulate the human eyes,by two camerasshooting the same object from different angles at the same time.At the playbackstage,is to separately giving the pictures shooting by the two cameras to botheyes,forming a stereoscopic feeling in mind.In order to achieve a stronger sense ofthat feeling,sometimes we use more cameras.That’s what we call the multi-viewvideo.
     Talking about the development of video technology,it is inevitably linkedtogether with the video coding.Because it usually contains enormous data,and hasbecome a great barrel in video communication and transmission.Traditional videocoding techniques have come to a muture stage,their research has reached theperformance bottleneck.However,the research of multi-view video coding(MVC) isstill on the first stage,many theories need to be improved.
     Therefore,our lab proposed the innovative multi-dimensional vector matrixtheory.It breaks the limitation of the traditional2-dimensional model and promotes theconcept of matrix to multi-dimensions.Based on this,we proposed themulti-dimensional vector DCT orthogonal transform kernel matrix based on discretecosine transform(DCT),and verified its orthogonality.In the previous studies,werealized the coding of color video stream based on this theory.Consider the use of thismulti-dimensional model in the processing of multi-view video can effectivelyremove the strong correlation among each angle,so,the purpose of this article,is to realize the multi-view video coding based on the multi-dimensional vector matrixtheory,and has already made some achievements.
     In this article,first,we did the block processing of the multi-view video sourcedata,in order to do the DCT orthogonal transform later;Consider the correlationamong each angle,we reorganized the blocked data according to each angle’sorder.The reorganized data constitute the original data involved in the DCT transform.
     According to the mathematical model of the multi-dimensional vector DCTorthogonal transform kernel matrix,we did the multi-dimensional vector DCTorthogonal transform to the reorganized data,the transformed coefficients had a goodeffect of concentration.According to the distribution of the coefficient,we used a moreflexible multi-dimensional quantization method,to remove some information uselessto the video.The Final work are the differential encoding,multi-dimensionalscanning,and run length encoding.The decoding process is the reverse process of thewhole coding process.
     At last we achieved the programming of the scheme in this article in VC++environment,discussed the compression performances in different quantizationconditions,verified the effectiveness and the feasibility of the scheme in thisarticle.Research on the multi-view video coding based on the multi-dimensionalvector matrix theory is still in its infancy,many steps in this scheme also have largeroom for improvement,we hope that this research may lay a good foundation forfurther study.

引文

[1]何东健.数字图像处理[M].西安：西安电子科技大学出版社，2003.
    [2]刘峰.视频图像编码技术及国际标准[M].北京：北京邮电大学出版社，2005.
    [3]张春田，苏育挺，张静.数字图像压缩编码[M].清华大学出版社，2006.
    [4]张鹤.基于四维n阶矩阵的彩色图像正交变换算法的研究[D].吉林大学通信与信息系统，2007.
    [5]毕厚杰.新一代视频压缩编码标准H.264/AVC[M].北京：人民邮电出版社，2005.
    [6] C.Zhang, Tsuhan Chen. Multi-view Imaging:Capturing and Rendering InteractiveEnvironments[J].Computer Vision for Interactive and Intelligent Environment,2005:51-67.
    [7]徐敏.基于聚类搜索的彩色分形图像压缩编码[J].复旦大学计算机应用技术，2005，20(05)：8-10.
    [8]朱剑英.基于DCT变换的图像编码方法研究[D].南京理工大学通信与信息系统，2004.
    [9] J.Huska, P.Kulla. Content Adaptive True Motion Estimator for H.264VideoCompression[J].Radio Engineering,2007,16(4):68-75.
    [10]容观澳.计算机图像处理[M].北京：清华大学出版社，2000.
    [11] J.B.Boettcher, J.E.Fowler. Video coding using a complex wavelet transform and setpartitioning[J].IEEE Signal Processing Letters, September2007,14(9):633-636.
    [12] A.Zandi, J.D.Allen, E.L.Schwartz, M.Boliek. CREW:compression with eversible embeddedwavelets[J].Proc.IEEE Data Compression Conf,1995:212-221.
    [13]鲁业频，李凤亭，陈兆龙，朱仁义.离散余弦变换编码的现状与发展研究[J].通信学报，2004，25(2)：106-118.
    [14] S.Aljoscha, M.C.Chen.3DAV exploration of video-based redering technology inMPEG[J].IEEE Transaction on Circuit and Systems for Video Technology,2004,14(3):348-356.
    [15] S.W.Golomb. Run-length encoding[J].IEEE trans. IT,1996,12(3):399-401.
    [16] Chang.N, L,Zakhor. A,Constructing a multivalued representation for viewsynthesis[J].International Journal of Computer Vision,2001,45(2):157-190.
    [17] H.K.Cheung, W.C.Siu, D.Gfeng. Novel illumination compensation scheme for spritecoding[J].International Conference on Signal Proeessing,2004:1223-1226.
    [18]桑爱军，陈贺新.三维矩阵彩色图像WDCT压缩编码[J].电子学报，2002，30(4)：594-597.
    [19] Oka S, Endo. Fujii T1Dynamic ray-space coding using multi-directional picture[J].IEICETechnical Report,2004,104(493):15-20.
    [20] Philipp Merkle, Aljoscha Smolic, Karsten Muller, Thomas Wiegand. Efficient PredictionStructures for Multiview Video Coding[J].IEEE Transactions on Circuits and Systems forVideo Technology,2007,17(11).
    [21]赵志杰.基于多维矩阵理论的彩色图像和视频编码研究[D].吉林大学通信与信息系统，2008.
    [22] Markus Flierl, Bernd Girod. Multiview Video Coding, Exploiting inter-imagesimilarities[J].IEEE Signal Processing Magazine,2007.
    [23] B.Pesquet-Popescu,V.Bottreau. Three-dimensional lifting schemes for motion compensatedvideo compression[C].Proc. IEEE Int. Conf. Acoustics, Speech and Signal Processing,2001:1793-1796.
    [24] Markus Flierl, Aditya Mavlankar, Bernd Girod. Motion and Disparity Compensated Codingfor Multiview Video[J].IEEE Transactions on Circuits and Systems for Video Technology,2007:1474-1483.
    [25]储昭辉.图像压缩编码方法综述[J].电脑知识与技术，2009，6，5(18)：4785-4787.
    [26] A.Secker, D.Taubman. Motion-compensated highly scalable video compression using anadaptive3D wavelet transform based on lifting[C].Proc. IEEE International Conf. ImageProcessing,2001(21):1029-1032.
    [27] D.E.达吉恩，R.M.默塞里奥著，程佩青，冯一云，吴中权译.多维数字信号处理[M].北京：科学出版社，1991.12.
    [28] W.Yang, F.Wu, Y.Lu, J.Cai, K.N.Ngan, S.Li.4d wavelet based multi-view videocoding[J].IEEE Transactions on Circuits and Systems for Video Technology,2006,16(11):1385-1396.
    [29] Wang.R, Y.Wang. Multiview video sequence analysis-compression and virtual viewpointsynthesis[J].IEEE Transactions on Circuits and Systems for Video Technology,2000,10(3):397-410.
    [30]徐秋敏，张云.多视点视频编码方法研究[J].宁波大学学报，2006，19(3)：296-301.
    [31] Andrew.B.Watson. Image Compression Using the Discrete cosine transform[J].MathematicalJournal,1994,4(1):81-88.
    [32]邵凌一，李久贤，余加兵.静态彩色图像的多维DCT变换压缩[J].中国图像图形学报，2004.07，9(7)：865-868.
    [33]霍俊彦，常义林.多视点视频编码的研究现状及其展望[J].通信学报，2010，31(5)：113-121.
    [34] Zheng Zhu, Dong xiaoli. Optimizing inter-view prediction structures for multi-view videocoding using simulated annealing[J].Journal of Zhejiang University-Science,2011(2):2-11.
    [35] Richard Hartley, Andrew Zisserman. Multiple view Geometry in Computer Vision, SecondEdition[M].UK: Cambridge University Press,2003.
    [36]李淳，马力妮.多视点视频编码技术研究[J].计算机与现代化，2009.1：105-108.
    [37] Keith Jack著，杨征等译.视频技术手册[M].北京：人民邮电出版社，2009.
    [38]陈刚，姚英学.多视点大空间三维坐标数据归一化方法[J].光学精密工程，2008，16(7)：1309-1314.
    [39] Yongtae Kim, Jiyoung Kim, Kwanghoon Sohn. Fast Disparity and Motion Estimation forMulti-view Video Coding[J].IEEE Transactions on Consumer Electronics,2007.
    [40] Chih-Chang Chen, Oscal T, C.Chen. A Low-Complexity Computation Scheme of DiscreteCosine Transform and Quantization for Video Compression[D].Signal and MediaLaboratories, Department of Electrical Engineering National Chung Cheng University,2001.
    [41]林昕，刘海涛，王嘉男.多视点分布式视频编码的研究[J].哈尔滨工程大学学报，2010，31(8)：1093-1099.
    [42]贾正根.多视点成像技术[J].真空电子技术，2002.2：51-53.
    [43] B.Yeo, B.Liu. Volume rendering of DCT-based compressed3D scalar data[J].IEEE Trans,1995.
    [44] Yan Tao, An Ping. Frame-layer bit allocation for multi-view video coding based on framecomplexity estimation[J].Journal of Shanghai University,2010(1):1-10.
    [45]桑爱军，陈贺新.基于三维离散余弦变换的彩色图象压缩编码[J].中国图像图形学报，2002.12，7(A)(12)：1269-1273.
    [46] Qiu Tao, Fu randi. Approaches to multiview image coding for2D camera array[J].Journal ofZhejiang University,2008(2).
    [47] Zongju Peng, Mei Yu. Fast marcoblock mode selection algorithm for multiview depth videocoding[J].Chinese Optics Letters,2010(2).
    [48]胡明.四维n阶矩阵[J].景德镇高专学报，2002，17(2)：18-22.
    [49] Xiangjian He, T.Hintz, Qiang Wu, etc. A New Simulation of Spiral Architecture[C].Proc. ofInternational Conference on Image Processing, Computer Vision, and Pattern Recognition,2006.
    [50] Huilian Liao, Zhen Ji, Q.H.Wu. A Novel Genetic Particle-Pair Optimizer for VectorQuantization in image coding[C].IEEE Congress on Evolutionary Computation,2008:708-713.
    [51] R.Krishnamoorthi, N.Kannan. A new integer image coding technique based on orthogonalpolynomials[J].Image and Vision Computing,2009,27(8):999-1006.
    [52] N.Ahmed, T.Natarajan, K.R.Rao. Discrete cosine transform[J].IEEE Trans. Commun,1974(23):90-93.
    [53] S.Kwon,A, Tamhankar,K, R.Rao. Overview of the H.264/MPEG-4part10[J].Journal ofVisual Communication and Image Representation,2006:186-216.
    [54] D.Marple, T.Weingard, G..J.Sullivan. The H.264/MPEG4Advaced Video Coding Standardand its Applications[J].IEEE Communication Magazine,2006.
    [55]胡铁根.基于多维矢量DCT正交矩阵的视频流压缩算法的研究[D].吉林大学通信与信息系统，2008.
    [56]邓琳琳.基于视觉特性的彩色视频流压缩编码算法的研究[D].吉林大学通信与信息系统，2009.
    [57]孙铁凝.2M维矢量正交变换核矩阵及在多视角视频编码中的应用[D].吉林大学通信与信息系统，2011.

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700