数字音视频码流的分割及合并技术研究

英文题名：Research on the Digital Video/audio Splitting and Merging Technology
作者：翁超
论文级别：硕士
学科专业名称：信号与信息处理
中文关键词：数字音视频 ; 非线性编辑 ; P2 ; MXF ; MP4 ; AVCI ; DV ; 集群转码 ; MPEG-2 ; 传送流 ; 分割粒度 ; GOP
英文关键词：digital video/audio ; nonlinear editing ; P2 ; MXF ; AVCI ; DV ; MP4 ; cluster transcoding ; MPEG-2 ; TS ; splitting granularity ; GOP
学位年度：2010
导师：王兴东
学科代码：081002
学位授予单位：上海交通大学
论文提交日期：2010-01-01

摘要

随着数字音视频压缩技术的发展及各类多媒体业务的升级,音视频码流的分割/合并技术的应用也将逐渐广泛。本文主要从素材编辑、集群转码两类应用环境入手,分别对音视频码流的分割/合并技术进行研究。
     素材编辑环境下的音视频分割/合并侧重于针对具有完整组织结构的音视频素材实现非线性编辑“剪”和“接”的操作。本文针对现今高清非编制作的主流格式P2系列音视频素材,先后讨论了高码率MXF格式及低码率MP4格式素材的分割、合并技术实现。对于采用帧内压缩方式的DV、AVCI两类高码率MXF素材文件而言,难点在于对原素材元数据的解析、保留以及对较大素材文件实现的高效性,文中详细介绍了对此类文件元数据解析、音视频数据定位的流程,提出了多线程的重写方案,实验并确定了合适的重写数据块大小,有效缩短任务耗时;对于采用了帧间压缩方式的低码率MP4文件,文中具体针对低延时模式及含有双向预测帧的情况提出了基于帧变换的分割方案,达到了帧精度,与全解/分割/再次编码的传统方案相比有以下优点:由于仅在分割点附近的相应帧做帧类型变换,不需做全范围的解码编码,有效缩短了任务时间;避免了由全解/分割/再次编码方案造成视频图像降质的不足。
     集群环境下的音视频分割/合并侧重于提出多粒度的分割方案以及平滑的子片段合并算法及方案,使集群转码系统能够有效的整合计算资源,完成转码任务。本文结合集群转码系统业务流程的特点,分析了由转码管理服务器端对音视频做物理分割方案的不足,提出了基于打点的准分割方案,并针对常用的MPEG-2传送流格式具体讨论了如何对素材进行解复用打点以及任务拆分,确定了基于GOP的分割策略。随后着重讨论了如何对素材片段进行合并复用,保证音视频的重同步。最后在含有7个计算节点的集群转码系统环境下着重就分割粒度对转码性能的影响进行了实验,提出了合适的素材分割粒度。
The splitting and merging technology for video material has been more and more widely applied with the development of digital video/audio compression and the enhancement of various multimedia services. This dissertation will focus on the algorithms and implementations of splitting and merging for video/audio from two specific points of view, material editing and cluster transcoding.
     In the material editing environment, the splitting and merging implementation will place its emphasis on how to implement the‘cut’and‘splice’, known as nonlinear editing operations, on certain video/audio materials. Aiming at P2, current mainstream formats of HD nonlinear editing, the splitting and merging implementation for high rate MXF file and low rate MP4 file will be discussed in succession. For intra-frame compressed MXF files of DV and AVCI formats, the key of the implementation lies on the analysis and record of metadata and efficiency for large files. The process of metadata analyzing and video/audio data location for this kind of materials are introduced in detailed. Multi-thread file I/O scheme is proposed to reduce time-consuming, the experiments are conducted on rewriting block size and the appropriate size is suggested. For inter-frame compressed low rate MP4 files, the frame type transforming based scheme is proposed to reach the frame precision of operations for both low-delay and B-frame contained cases. Compared with the traditional schemes, it has following advandtages: reducing time-consuming dramatically due to avoiding the process of decoding and reencoding whole file; preventing video frames from quality degradation introduced by reencoding.
     In the cluster transcoding environment, the splitting and merging implementation will focus on splitting video/audio materials in multiple granularity, merging split clips into a smooth and integrated one, which guarantees the cluster transcoding system can effectively collect computing resources and accomplish tasks efficiently. In consideration of the service process characteristic of cluster transcoding system, the dissertation first analyzes the shortage of the physical splitting scheme and a quasi-splitting scheme based on point recording is proposed to adapt the services process of cluster system. The detailed discussion on the demultiplexing and task splitting schemes for MPEG-2 transport stream will be conducted, the splitting strategy based on GOP (Group of Pictures) is determined. Then the dissertation discusses the video/audio clips merging and multiplexing algorithms to guarantee the resynchronization between video and audio data in detailed. Finally, in the environment of cluster system which has 7 computing nodes the experiments are conducted to research the impact of splitting granularity on the performance of cluster transcoding.

引文

[1]国家863计划“分布式宽带业务制作协同环境的开发”任务合同书. 2009.
    [2]王志军.数字媒体非线性编辑技术.北京:高等教育出版社. 2005:2-10.
    [3] SMPTE 314M, Data Structure for DV-Based Audio, Data and Compressed Video - 25 and 50 Mb/s, 1999.
    [4] SMPTE 370M, Data Structure for DV-Based Audio, Data and Compressed Video at 100 Mb/s 1080/60i,1080/50i,720/60p. 2005.
    [5] ISO/IEC 14496-10, Coding of audio-visual objects -- Part 10: Advanced Video Coding. 2004
    [6] ISO/IEC 14496-12, Coding of audio-visual objects -- Part 12: ISO base media file format. 2005.
    [7] ISO/IEC 14496-14, Coding of audio-visual objects -- Part 14: MP4 file format. 2005.
    [8] SMPTE 377M, Material Exchange Format (MXF) File Format Specification (Standard). 2003.
    [9] SMPTE EG41, MXF Engineering Guide (A guide explaining how to use MXF). 2003.
    [10] SMPTE EG42: MXF Descriptive Metadata (A guide explaining how to use descriptive metadata in MXF). 2003.
    [11] Yao Wang, Jorn Ostermann, Yaqin Zhang. Video Processing and Communications. 2001: 271-284.
    [12] Yao Wang, Jorn Ostermann, Yaqin Zhang. Video Processing and Communications. 2001: 290-291.
    [13] ISO/IEC 13818-2, Generic coding of moving pictures and associated audio information– Part 2: Video
    [14] ISO/IEC 13818-2, Generic coding of moving pictures and associated audio information– Part 1: System
    [15]黄珏,李建国,黄元. MXF格式浅析.广播电视技术2005 (3): 5-10.
    [16] Matsushita Electric Industrial Co., Ltd., Specification of Content Data Structure on P2 Card. 2007
    [17] SMPTE 390M, Specialized Operational Pattern“Atom”(Simplified Representation of a Single Item). 2004.
    [18] SMPTE 336M, Data Encoding Protocol Using Key-Length Value. 2007.
    [19] SMPTE 380M, DMS1 (a standard set of descriptive metadata to use with MXF files). 2004.
    [20] SMPTE 436M, MXF Mappings for VBI Lines and Ancillary Data Packets. 2006.
    [21] SMPTE 379M, Generic Container (the way that essence is stored in MXF files). 2003.
    [22] SMPTE 381M, GC-MPEG (how to store MPEG essence data in MXF using the Generic Container).2004.
    [23] SMPTE 383M, GC-DV (how to store DV essence data in MXF us ing the Generic Container). 2003.
    [24] SMPTE 384M, GC-UP (how to store Uncompressed Picture essence data in MXF using the Generic Container). 2005.
    [25] SMPTE 388M, GC-AA (how to store A-law coded audio essence data in MXF using the Generic Container). 2004.
    [26] SMPTE 389M, Generic Container Reverse Play System Element. 2005.
    [27] SMPTE 386M, GC-D10 (how to store SMPTE D10 essence data in MXF using the Generic Container). 2003.
    [28] SMPTE 387M, GC-D11 (how to store SMPTE D11 essence data in MXF using the Generic Container). 2003.
    [29]董志. MPEG-4文件分片策略及流量控制机制的研究[硕士论文].武汉:华中科技大学. 2004.
    [30] SMPTE 298M, Universal Labels for Unique Identification of Digital Data. 1997.
    [31] K. Wang, J.W. Woods. Compressed domain MPEG-2 video editing, IEEE Conf. Multimedia and Expo. Aug. 2000, Vol.1, pp. 225-228.
    [32] Brightwell P J, Dancer S J, Knee M J. Flexible switching and editing of MPEG-2 video bitstreams. Broadcasting Convention, 1997: 547-552
    [33] S.J. Wee, B. Vasudev, Splicing MPEG video streams in the compressed domain, IEEE First Workshop on Multimedia Signal Processing, June 1997, pp. 225-230.
    [34] S.J. Wee and J.G. Apostolopoulos, Asilomar, Efficient processing of compressed video, IEEE Int. Conf. Signals, Systems, and Computers, Asilomar, CA, Nov. 1998, Vol. 1; pp. 853-857.
    [35] S. J. Wee, Manipulating temporal dependencies in compressed video data with applications to compressed domain processing of MPEG video, IEEE Conf. Acoustics, Speech, and Signal Processing, Mar. 1999, Vol.6, pp. 3129-3132.
    [36]孙军,易彦,程国华.视频流无缝拼接中的帧转换.通信技术.2003 (9):1-4.
    [37]张佳德. MPEG-2传送流实时复用和无缝拼接的软件实现[硕士论文].成都:电子科技大学.2004.
    [38]刘昱,李桂菩. MPEG-2视频码流的拼接.有线电视技术.2002 (10) :5-6.
    [39]翁超,王兴东,王树红. P2素材的高效剪辑实现.电视技术. 2010(8).
    [40] S. M. Akramullah, I. Ahmad, M. L. Liou,“Performance of a Software-Based MPEG-2 Video Encoder on Parallel and Distributed Systems,”IEEE Transactions on CSVT, vol. 7, no. 4, pp. 687-695,Aug. 1997.
    [41] S. M. Akramullah, I. Ahmad, M. Liou. A Data-Parallel Approach for Real-Time MPEG-2 Video Encoding. Journal of Parallel and Distributed Computing, Nov. 1995.30(2):129-146.
    [42] J. Nang, J. Kim. An Effective Parallelizing Scheme of MPEG-1 Video Encoding on Ethernet-Connected Workstations. Proceedings Advances in Parallel and Distributed Computing (Cat.No.97TB100099) pp. 4-11, March 1997.
    [43] P. Tiwari, E. Viscito. A parallel MPEG-2 video encoder with look-ahead rate control. Proc. IEEE International Acoustics, Speech and Signal Processing Conf., vol.4, pp.1994–1997
    [44] Guo JN, Chen F, Bhuyan L, Kumar R. A cluster-based active router architecture supporting video/audio stream trans-coding service. In: Proc. of the Int'l Parallel and Distributed Processing Symp. 2003: 8-15
    [45] E. Amir, S.McCanne, R. Katz. An active service framework and its application to real-time multimedia transcoding. ACM SIGCOMM Symp.,September 1998.
    [46] Yasuo Sambe, Shintaro Watanabe, Dong Yu, Taichi Nakamura, Naoki Wakamiya. High-speed Distributed Video Transcoding for Multiple Rates and Formats. IEICE Transaction On Information and System, VOL.E88-D, Issue 8, 2005:1923-1931
    [47] Miguel Ribeiro, Oliver Sinnen, Leonel Sousa. MPEG-4 Natural Video Parallel Implementation on a Cluster. Image and Video Coding, RECPAD2002: 2-3
    [48]胡旭迈,任金昌.一种基于GOP的MPEG-2媒体流切割与合并方法.微型电脑应用. 2005(7):51-52

地址：北京市海淀区学院路29号邮编：100083

电话：办公室：(+86 10)66554848；文献借阅、咨询服务、科技查新：66554700