基于H.261协议的视频会议系统终端的研究与实现

英文题名：The Research and Implementation of AH.261--Based Video Conference Terminal System
作者：王永会
论文级别：硕士
学科专业名称：计算机软件与理论
中文关键词：视频会议 ; 视频编解码器 ; 音频编解码器 ; 实时传输 ; 二次解码方法
英文关键词：videoconference ; videocodec ; audiocodec ; realtimetransportation ; decode-in-two-phases
学位年度：2004
导师：赵永哲
学科代码：081202
学位授予单位：吉林大学
论文提交日期：2004-05-01

摘要

信息社会的发展、网络及计算机技术的进一步普及，使得对网络上提供高质量的视频/音频等多媒体服务的需求越来越大。视频会议技术就是一种让身处异地的人们通过某种传输介质实现“实时、可视、交互”的多媒体通讯技术。视频会议技术的应用可以为用户节省时间、提高工作效率，其应用领域非常广泛，有非常良好的发展前景。在视频会议技术的发展趋势中，重要的两点就是使用的协议类型从H.320向H.323转化和编解码方式由硬件向软件转化。
    为了对视频会议技术有一个深入的了解并为作者以后的相关研究奠定基础，从视频会议的发展趋势出发，在详细分析和比较了H.323中的各种协议后，选择了H.261协议、G.723.1协议和RTP协议进行了深入的研究和探讨并以这三个协议为基础完成了视频会议系统终端的纯软件实现。
    视频会议系统终端主要由四个类组成：主窗体类、H261类、G7231类和RTP类。由H261类和G7231类来实现对本终端的音视频数据的压缩及对其它终端的音视频数据的解压缩，由主窗体类向H261类和G7231类提出编解码请求并获得结果，由主窗体类和RTP类的通讯来实现本终端与其它终端和服务器的通讯。正文的主要内容就是结合源代码针对实现该系统终端时的难点和重点及其解决方法进行的详细阐述。
    在实现了视频会议系统终端的基础上，针对视频解码系统中常用的对输入码流进行译码的方法－Huffman译码树方法解码速度慢的缺点，提出了改进方法，即二次解码方法。该方法的基本思想可以应用于所有的变长码表的译码工作中。
    二次解码方法是在一次解码方法的基础上对构造难度、空间大小和执行效率的折衷。一次解码方法的主要思想是根据变长码表中的码字的最大长度设计一个结构数组，用码字作为地址，用该码字所代表的游程和值(run,level)作为结构数组中结构的成员。每次从输入码流中取x比特，将该x比特的值作为数组的地址，从该地址中取出数据，即完成了一个码字的解码。
    使用一次解码方法的一个限制条件是：考虑到设计结构数组的难度及所需要的空间，该方法适用于最大码字长度小于等于10的变长码表。
    该方法的另一个限制条件是：变长码表中的码字利用率要高（即大多数游程和值都唯一对应该结构数组中的地址且游程和值(run，level)的个数接近于结构数组中的结构数）。但H.261标准中的表5（VLC Table for TCOEFF）中码字长度大（14位），码字利用率较低（要设计的结构数组的结构数是该表中游程和值的个数的128倍）。因此并不适用于一次解码方法。

    二次解码方法就是通过执行两次一次解码方法来解决一次解码方法所面临的上述问题。
    二次解码方法的工作流程是：首先从输入码流中取x位，将x位所对应的值作为地址，从事先设计的结构数组A中取出该地址对应的结构sa。根据sa的标志位成员的值可知首次解码是否成功。假设sa的代表剩余位数的成员值为r，r是x与码字实际长度的差。若标志位为1，表示首次解码成功，指向输入码流头部的指针回移r位，再取出游程和值(run,level)用于后继处理。若标志位为0，表示码字长度大于x，则从输入码流中再取r位进行第二次解码，从结构数组B中即可获得码字长度为x＋r的码字所代表的游程和值。
    由于变长码表中码字的长度是依据游程和值(run,level)出现的机率设计的，即出现的机率越大，码字越短，因此首次解码成功的机率也是很大的，但由于存在第二次解码，所以效率是下降了。因此二次解码方法是对构造体设计难度和译码效率的折中。
    通过对上述两种译码方法的比较分析，二次解码方法在运行时间上的优越性得到了验证，即二次解码方法通过付出更多的空间代价，将平均解码时间缩短约一半，提高了译码速度。由于视频解码对实时性要求高，因此用空间换取时间是值得的。
    综上所述，本文用软件实现了主要基于H.261协议的视频会议系统终端，同时提出了一个针对视频输入码流进行译码的新方法－二次解码方法，将其应用于视频解码中，获得了良好效果。由于时间不足，系统终端还很不完善，作者将会在后续工作中对其进行改进和完善。
With the development of information technology and the popularity of network, the need for supplying high quality video/audio multimedia services grows rapidly. Videoconference is a multimedia communication tech that can achieve the real-time and interactive communication with people who are in deferent places. Apply of Videoconference tech can save customer’s time and improve the work efficiency etc. Videoconference tech has a wide applicable scope, and it has a good prospect. During the evolution of videoconference tech, two important trends are: the protocol used is transitioned from H.320 to H.323 and the method of codec is changed from hardware codec to software codec.
    To acquire an in-depth knowledge of videoconference tech and establish a solid basis for latter correlation research, according to the trend of videoconference development, the author of this article selects H.261 protocol, G.723.1 protocol and RTP protocol for in-depth research and discussion, and then based on these protocols this paper illuminates the realization of videoconference terminal system by software implementation.
    Videoconference terminal system is mainly composed of four classes: main form class, H261 class, G7231 class and RTP class. H261 and G7231 classes are responsible to video/audio’s encoding and decoding. Main form class sends codec request to H261 and G7231 classes and get result from these classes. Through the communication between main form class and RTP class, main form class gets information of videoconference server and information of other terminals.
    On the basis of implementation of videoconference terminal system, this article puts forward a new method, namely decode-in-two-phases, to improve the speed of bit stream decoding. The basic concept of this method can be used in all decoding systems for decoding vlc table.
    Decode-in-two-phases method is a tradeoff of the difficulty and the efficiency on the basis of decode-in-one-phase method. The main concept of the decode-in-one-phase method is according to the max length of the code in vlc table designs a structure array, using the bit value of code as the address of the array element that is corresponding to the code and using the run and level represented by the code as the corresponding array element’s members. When got x bits from bit stream, it uses the bit value of x bits as the address to get the run and level from structure array.
    There are two conditions to use decode-in-one-phase method. One is the


    restriction for the max length of code. It is suitable for vlc tables in which the max length of code is smaller than 11 with a view to difficulty of designing the structure array and the needed space for the structure array. Another is the use rate of the code in vlc table. This rate referes to the number of run and level closes to the number of structure array’s element. In the Table 5 (VLC Table for TCOEFF) of H.261 protocol, the max length of code is 14 and the use rate of the code is very low. So decode-in-one-phase method is not suitable for Table 5 of H.261 protocol.
    Decode-in-two-phases method is a method via executing decode-in-one- phase two times to break the limitation of the decode-in-one-phase method.
    The working process of decode-in-two-phases method is: it first gets x bit from bit stream, then uses the bit value of x bit as address to get the corresponding array element (for example: sa). According to the value of flag (a member of sa) we can know the first time decode is successful or not. We can also get the value of r(a member of sa) which represents the difference of x and the real length of code . If value of flag is 1, that means the first time decode is successful, then the pointer pointing to the bit stream header move back r bit, then getting the run and level from sa for afterward handling. If value of flag is zero, that means the length of code is x plus r, so we get r bit again from bit stream for decode again. The run and level can be gotten from another structure array. Moreover, the result of x adding r is seen a

引文

[1] ITU-T Recommendation H.261,1993.
    [2] Ming Liou, “Overview of the p 64 kbit/s video coding standard” Comm. of the ACM, April 1991.
    [3] Arun N. Netravali, and Barry G. Haskell, Chapter 8, Digital Pictures: Representation, Compression, and Standards, 2nd Edition, Plenum Publishing Corp., New York, NY.
    [4] Joan L. Mitchell et al., Sec. 19.2, MPEG Video: Compression Standard, Chapman & Hall, New York, NY.
    [5] JPEG Technical Specification, Revision 8, August 1990.
    [6] Coding of Moving Pictures and Associated Audio, MPEG Draft, May 1991.
    [7] Le Gall, D.J., MPEG: A Video Compression Standard for Multimedia Applications, Commun. ACM, April 1991.
    [8] Koya T, Iinuma K, Hirano A, et al. Motion-compensated inter-frame coding for video conferencing［C］. In Proc NTC81. New Orleans, LA, Nov 1981.
    [9] Kappagantula S, Rao K R. Motion compensated inter-frame image prediction[J]. IEEE Trans Commun, 1985.
    [10] Jain J R, Jain A K. Displacement measurement and its application in inter-frame image coding ［J］. IEEE Trans Commun, 1981, COM-29.
    [11] Ghanbari. The cross-search algorithm for motion estimation ［J］. IEEE Trans Commun, 1990, COM-38(7).
    [12] Lee W, Wang J F, Lee J Y, et al. Dynamic search-window adjustment and interlaced search for block-matching algorithm ［J］. IEEE Trans CASVT, 1993, 3.
    [13] ISO/IEC JTC1/SC29/W G11. IS 13818. Generic coding of moving pictures and associated audio information (MPEG-2). Geneva: ISO/IEC, 1994.
    [14] ITU-T. Draft Recommendation G.723.1 Dual rate speech coder for multimedia telecommunications transmitting at 5.3 & 6.3 kbit/s. Ocotober，1995.
    [15] ITU-T. Annex A to Recommendation G.723.1 Silence compression scheme for dual rate communications transmitting at 5.3 & 6.3 kbit/s. May，1996.
    [16] ITU-T G..723.1 Implementers Guide，2002.10.25.
    [17] H. Schulzrinne, et al. RTP: A Transport Protocol for Real-Time Applications (RFC 1889) 1996.1.

    [18] H. Schulzrinne, et al. RTP Profile for Audio and Video Conferences with Minimal Control (RFC 1890) 1996.1.
    [19] ITU-T Recommendation H.323，2000.11.
    [20] David K Fibush. Timing and Synchronization Using MPEG-2 Transport Streams[ S] . SMPTE Journal , 1996 , July.
    [21] Scott A. Vanstone. An introduction to error correcting codes with applications. Kluwer Academic Publishers,1989.
    [22] Vera Pless. Introduction to the theory of error correcting codes. Wiley,1989.
    [23] Man Young Rhee. Error-correcting Coding Theory. McGraw-Hill,1989.
    [24] R. E. Blahut. Theory and Pratice of Error Control Codes. Addison-Wesley Publishing Company,1993
    [25] 袁清溪. 基于软件编译码器的视频会议系统，西安电子科技大学，西安，2001
    [26] 南钐、蔡德钧、容太平. 基于内容的视频编码技术，通信技术，1999.3
    [27] 高文. 多媒体数据图像压缩技术，[M]电子工业出版社，北京，1995
    [28] 周漩、余松煜、张文军.一种适用于低码率图像压缩的运动估计算法［J］.上海交通大学学报,1998,32(4)
    [29] 杨行峻、迟惠生、唐昆等. 语音信号数字处理，电子工业出版社，北京，1995
    [30] 胡航编著，语音信号处理，哈尔滨工业大学出版社，2000
    [31] 虞正华，楚明等. MPEG-2 解码中的音视频同步及其实时实现，上海交通大学学报，1998.32
    [32] 张宗橙.纠错编码原理和应用，电子工业出版社，北京，2003.4
    [33] 张益贞，刘滔. Visual C++实现MPEG/JPEG编解码技术，人民邮电出版社，北京，2002
    [34] 钟玉琢等. 多媒体计算机技术初级应用，高等教育出版社，2000.8
    [35] 李小平、曲大成. 多媒体网络通讯，北京理工大学出版社，北京，2001.12
    [36] 李博轩.Visual C++.NET多媒体应用开发技术，国防工业出版社，北京，2002.10
    [37] Clifford A.Shaffer. 数据结构与算法分析，电子工业出版社，北京，2001

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700