基于CORDIC的离散三角变换快速算法及其实现研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
离散三角变换(Discrete Trigonometic Transform, DTT)在信息处理,尤其是视频、图像处理领域具有非常重要的地位和应用,其快速算法及硬件实现一直是信息处理领域的研究热点。新视频压缩标准H.265/HEVC发布后,传统的典型点数DTT已不能满足实际应用要求,大点数(尤其是2n点)、可变点数的快速算法将成为该领域的研究热点。
     在视频、图像处理领域,精确计算DTT的硬件实现方式已基本成熟,采用近似计算成为提高其计算速度的另一有效途径。随着使用者对图像品质、处理速度要求不断提高,采用一种编码方式已不能满足应用要求。视频、图像压缩编码正向多正交变换混合编码方向发展,设计出能实现多种正交变换且性能优良的通用架构(Unified architecture)是亟待解决的问题。
     本文针对以上研究热点问题,对大点数(2n点)DTT的快速算法及其基于改进型非重叠CORDIC的硬件实现以及离散正交变换的通用架构展开研究,主要研究工作包括:
     1、研究了以CORDIC作为变换核函数的任意2n点DTT快速算法。首先,通过奇偶分解推导出了以CORDIC作为变换核函数的任意2n点DCT-II和DST-II的快速算法,并给出了规律一致的信号流图;然后,根据正交变换的对偶原理得到了DCT-III和DST-III的快速算法及其信号流图,从而提出了一种新型的基于CORDIC的基-2DTT快速算法。与现有算法比较,该算法在硬件复杂度、可扩展性、流水线设计、模块化设计等性能指标上优于同类算法,且具有以下突出特点:适用于任意2n点的DTT;既有较低的算法复杂度又易于VLSI硬件实现;算法中CORDIC的旋转角度为等差数列;具有规则的蝶形运算结构和统一的缩放因子,易于实现流水线设计;支持原位运算等。
     2、研究了基于非重叠CORDIC处理单元的DTT硬件实现方法。首先,针对传统非重叠CORDIC算法中迭代次数与计算精度相互制约的问题,提出了一种改进型非重叠CORDIC(MCORDIC),以牺牲极少精度为代价将迭代次数减少了50%;然后,根据所提出的算法中CORDIC的旋转角度为等差数列这一特点,采用复用设计和模块化设计思想,大幅度减少了计算DTT所需的CORDIC运算单元的数量和类型,理论上任意2n点的DTT仅需要一种类型CORDIC;在此基础上提出了一种新型DTT脉动阵列设计方法,基于该方法设计的脉动阵列在电路延迟、吞吐率、流水线操作及硬件复杂度等性能指标上优于其他类似架构,并解决了由于存在不同类型的基本运算单元(PE)而导致的计算时序不同步以及PE中存在多种算术运算等问题。
     3、以所提出的快速算法为研究基础,对四种类型DTT之间的内在关系进行了探讨。利用相同点数的DTT具有相同的CORDIC运算单元这一特点,通过控制信号流向来实现不同类型DTT的计算,从而提出了一种基于CORDIC的DTT通用架构设计方法。所提出的方法适用于任意2n点DTT,可实现四种DTT的任意组合的通用架构,并且具有以下优点:具有统一的变换核函数,控制电路简单,硬件复用率高。利用该方法设计了具有代表性的几种通用架构,所设计的架构在硬件复杂度、控制复杂度、吞吐率、可扩展性、模块化程度、流水线设计等性能指标上优于现有通用架构。此外,还给出了DWHT/DCT-II和Haar-DWT/DCT-II通用架构的设计方法。
     4、在Haar-DWT/DCT-II通用架构的基础上,研究了基于图像内容的压缩编码硬件实现架构。该架构以图像的JND值为判断依据有选择的进行图像压缩编码。为解决JND计算复杂度高、难于硬件实现的问题,提出了一种基于Haar-DWT的近似计算JND算法,该算法虽然只得到JND的近似解,却大幅度降低了计算复杂度。设计了可实现两种工作模式(近似计算或非近似计算)的可重构DCT-II架构。研究了基于图像内容压缩编码的控制方案、工作模式选取的参考位置和JND阈值的选取方法。实验结果表明该压缩编码架构切实可行。所设计的压缩编码硬件实现架构中没有复杂的算术运算,计算复杂度非常低,因此非常易于VLSI硬件实现。
     本文提出了一种新型的以CORDIC作为变换核函数的DTT快速算法,为研究DTT快速算法提供了新的研究思路和方法。研究的近似计算DTT的VLSI实现方式及其通用架构可以满足视频、图像压缩领域目前的需求,并符合未来该领域的发展方向。正如FFT的提出使得DFT在实际应用中得到飞跃性的发展,具有类似FFT特点的DTT快速算法也将使得DTT得到更广泛的应用。论文所研究内容既具有理论研究的前瞻性又具有现实的应用价值。
Discrete trigonometric transforms (DTTs) play a very important role in information processing, especially for video and image processing. Therefore, their fast algorithms and VLSI hardware implementations have been hot research field. After the new video compression standard H.265/HEVC published, conventional fast algorithms of the DTTs have been unable to meet the application requirements, large-point (especially for2n-points) and variable-point fast algorithms will become research hot.
     In video and image processing applications, the hardware implementation architectures for accurately calculating the DTTs have been mature. Approximate calculating instead of accurate calculating will become another way to improve the computing speed. Using a single compression coding cannot meet people's increasing demands in video and image processing. Therefore, developing the hybrid compression coding methods and their corespording unified-architectures for a variety of orthogonal transforms are urgent problem.
     For the above hot research problems, the research work will focus on the large point (e.g.,2n-points) fast algorithms of the DTTs, the unfolded CORDIC based approximation implementation for DTT computation and the unified architectures of the DTTs.
     The major contribution of the work includes:
     1. The CORDIC-based radix-2fast algorithms of the DTTs are studied. In this paper, the CORDIC algorithm is used as the transform kernel function and the fast algorithms for any2n-point DCT-II and DST-II are developed using odd-even decomposition method. On this basis, any2n-point DCT-III and DST-III fast algorithms and their signal-flows are deduced according to the principle of duality of the orthogonal transforms. Thus, new CORDIC-based radix-2fast algorithms of the DTTs are proposed. The proposed fast algorithms are better than the existing algorithms in hardware complexity, scalability, pipelinablity and modularity. Furthermore, the proposed algorithms also have some distinguish advantages, such as suitable for any2n-point and various types of DTTs, low hardware complexity and suitable for VLSI implementation, arithmetic-sequence rotation angles of the CORDICs, uniform scaling factor and regular data flow, suitable for pipelineability and supporting in-place operation and so on.
     2. The hardware implementation for the proposed fast algorithms of the DTTs are studied. Firstly, in order to solve the problem between the iterations and the computation precision in the traditional non-overlap CORDIC algorithm, an improved non-overlap CORDIC algorithm (MCORDIC) is developed. This improved algorithm can reduce the iteration number of the unfolded CORDIC by50%at a little cost of accuracy. Secondly, since the rotation angles of the CORDIC in proposed algorithms are arithmetic sequence, the required number and types of the CORDIC can be significantly reduced by using modular design and reusable design. Theoretically speaking, any2n-point DTT can be achieved using only one type of CORDIC. Then an novel approach for designing the systolic array of the DTTs is developed. The proposed systolic array is superior to other similar structures in lantency, throughput, pipelinable and hardware complexity. Besides, this design approach can be used to solve the traditional problems of the DTT systolic arrays. Moreover, using row-column decomposition we propose a hardware utilization efficiency2-D DCT-II/DCT-III structure.
     3. The intrinsic relationship among proposed CORDIC-based DTTs is used to deduce a universal design approach for the DTT computation by taking full advantage of the nature of trigonometric functions. This approach uses the characteristic, which is that the same point DTTs have identical CORDIC cells, to achieve the computation of different DTTs by controlling dataflow. This design approach is suitable for any2n-point DTTs and any combination among four kinds of DTTs. Moreover, the proposed approach not only have unique transform kernel function, simple control circuit, but also has higher hardware reusing rate. Afterwards, several representative unified architectures are designed. The proposed unified architectures of the DTTs are superior to the existing unified architectures in hardware complexity, control complexity, throughput, scalability, modularity and pipelinability. In addition, we also develop the DWHT/DCT-II and Haar-DWT/DCT-II unified architecture design approach.
     4. The architecture for data-dependent compression coding is studied based on the Haar-DWT/DCT-II unified architecture. The JND is used as the threshold for selecting the compressing modes. To solve problems of the conventional JND algorithm, a approximated JND based on Haar-DWT is proposed. Though the proposed JND algorithm only obtain the approximated value of the JND, it sharply reduces the arithmetic complexity. Reuseable DCT-II architecture can be used to work in two oporating modes (approximate calculation mode and accurate calculation mode). Then, the control scheme is studied, and the referenced position and the JND threshold value of operate modes are slectected based on a lot of experiments. The proposed architecture for data-based compression coding is approved. The proposed architecture has very low hardware complexity without complicated arithmetic elements, so it is very suitable for the VLSI implementation.
     This paper proposes fast algorithms of the DTTs that use the CORDIC as kernel transfer function, which provides novel research ideas and methods for researching the fast algorithms of the DTT. In video, image compression field, the proposed alogorihms and architectures meet the requirements of the application and future development direction. As the Cooley-Tukey FFT puts forward the DFT in practical applications, the DTTs will be more widely used in the field of digital signal processing due to the fast algorithms that have the similar characteristics with the FFT. Therefore, these studies not only have theoretical prospective, but also have the practical application value.
引文
[1] Sullivan G J, Ohm J, Woo-Jin H, Wiegand T. Overview of the High EfficiencyVideo Coding (HEVC) Standard[J]. IEEE Transactions on Circuits and Systems forVideo Technology,2012,22(12):1649-1668.
    [2] Joshi R, Reznik Y A, Karczewicz M. Efficient Large Size Transforms forHigh-Performance Video Coding[C]//Proc. of SPIE, Vol.7798, Applications ofDigital Image Processing XXXIII. San Diego, USA,2010:779-831.
    [3] Bossen, F., Bross, B., Suhring, K., Flynn, D. HEVC Complexity andImplementation Analysis[J]. IEEE Transactions on Circuits and Systems for VideoTechnology,2012,22(12):1685-1696.
    [4] Shen S, Shen W, Fan Y, et al. A Unified4/8/16/32-Point Integer IDCT Architecturefor Multiple Video Coding Standards[C]//2012IEEE International Conference onMultimedia and Expo (ICME). Melbourne, Australia,2012:788-793.
    [5] Ayele E A, Dhok S B. Review of Proposed High Efficiency Video Coding (HEVC)Standard[J]. International Journal of Computer Applications,2012,59(15):1-9
    [6] Renato J. Cintra, Fábio M. Bayer. A DCT Approximation for ImageCompression[J]. IEEE Signal Processing Letter,2011,18(10):579-582.
    [7]成礼智.离散与小波变换新型算法及其在图像处理中应用的研究[M].长沙:国防科技大学出版社,2007:15-17
    [8] Lo K-T, Cham W-K. Analysis of Pruning Fast Cosine Transform[J]. IEEETransactions on Signal Processing,1996,44(3):714-717.
    [9] Tran T D. The BinDCT: Fast Multiplierless Approximation of the DCT[J]. IEEESignal Processing Letters,2000,7(6):141-144.
    [10] Liang J, Tran T D. Fast Multiplierless Approximations of the DCT with the LiftingScheme[J]. IEEE Transactions on Signal Processing,2001,49(12):3032-3044.
    [11] Chen Y J, Oraintara S, Tran T D, et al. Multiplierless Approximation of Transformswith Adder Constraint[J]. IEEE Signal Processing Letters,2002,9(11):344-347.
    [12] Sun C C, Ruan S J, Heyne B, et al. Low-Power and High Quality CORDIC-BasedLoeffler DCT for Signal Processing[J]. IET Proc.-Circuits Devices&Systems,2007,1(6):453-461.
    [13] Wu Z, Sha J, Wang Z. An Improved Scaled DCT Architecture[J]. IEEE Trans.Consumer Electronics,2009,55(2):685-689.
    [14] Yu S, Swartzlander E E. A Scaled DCT Architecture with the CORDICAlgorithm[J]. IEEE Trans. Signal Processing,2002,50(1):160-167.
    [15] Volder J E. The CORDIC Trigonometric Computing Technique[J]. IRE Trans.Electron. Computers,1959,EC-8(3):330-334.
    [16] Vijayabhaskar P V M, Raajan, N R. Comparison of Wavelet Filters in ImageCoding Using Hybrid Compression Technique[C]//International Conference onEmerging Trends in VLSI, Embedded System, Nano Electronics andTelecommunication System (ICEVENT). Tiruvannamalai, India,2013:1-5.
    [17] Viswanath K, Mukherjee J, Biswas P K. Wavelet transcoding in the block discretecosine transform space [J]. IET Image Processing.2010,4(3):143-157.
    [18] Meng Z Y, Yu P P, Yu G Q. Copyright Protection for Digital Image Based on JointDWT-DCT Transformation[C]//International Conference on Wavelet Analysis andPattern Recognition (ICWAPR). Xian, China,2012:11-14.
    [19] Saxena A, Fernandes F C. Mode Dependent DCT/DST for Intra Prediction inBlock-Based Image/Video Coding[C]//18th IEEE International Conference onImage Processing (ICIP). Brussels, Belgium,2011:1685-1688.
    [20] Wang K, Chen J, Cao W, et al. A Reconfigurable Multi-Transform VLSIArchitecture Supporting Video Codec Design[J]. IEEE Transactions on CircuitsSystems II, Express Briefs.2011,58(7):432-436.
    [21] Qi H, Huang Q, Gao W. A Low-Cost Very Large Scale Integration Architecture forMultistandard Inverse Transform[J]. IEEE Transactions on Circuits Systems II,Express Briefs.2010,57(7):551-555.
    [22] Cooley J W, Tukey J W. An Algorithm for the Machine Calculation of ComplexFourier Series[J]. Mathematics of Computation,1965,19(90):297-301.
    [23] Ahmed N, Natarajan T, Rao K R. Discrete Cosine Transform[J]. IEEE Transactionson Computers,1974,23(1):90-93.
    [24] Jain A K. A Fast Karhunen-Loeve Transform for a Class of Random Processes[J].IEEE Transactions on Communications,1976,24(9):1023-1029.
    [25] Kekre H B, Solanki J K. Comparative Performance of Various TrigonometricUnitary Transforms for Transform Image Coding[J]. International Journal ofElectronics,1978,44(3):305-315.
    [26] Jain A K. A Sinusoidal Family of Unitary Transforms[J]. IEEE Transactions onPattern Analysis and Machine Intelligence,1979,1(4):356-365.
    [27] Wang Z, Hunt B R. The Discrete W Transform[J]. Applied Mathematics andComputation,1985,16(1):19-48.
    [28] Makhoul J. A Fast Cosine Transform in One and Two Dimensions[J]. IEEETransactions on Acoustics, Speech and Signal Processing,1980,28(1):27-34.
    [29] Storn R. Efficient Input Reordering for the DCT Based on a Real-valuedDecimation-in-Time FFT[J]. IEEE Signal Processing Letters,1996,3(8):242-244.
    [30] Johnson S G, Frigo M. A Modified Split-radix FFT with Fewer ArithmeticOperations[J]. IEEE Transactions on Signal Process,2007,55(1):111-119.
    [31] Shao X, Johnson S G. Type-II/III DCT/DST Algorithms with Reduced Number ofArithmetic Operations[J]. Signal Process,2008,88(6):1553-1564.
    [32] Shao X, Johnson S G. Type-IV DCT, DST, and MDCT Algorithms with ReducedNumber of Arithmetic Operations[J]. Signal Process,2008,88(6):1313-1326.
    [33] Chen W H, Smith C H, Frlick S C. A Fast Computational Algorithm for thediscrete Cosine Transforms[J]. IEEE Transactions on Communications,1997,25(9):1004-1009.
    [34] Lee B G. A New Algorithm to Compute the Discrete Cosine Transform[J]. IEEETransactions on Signal processing,1984,32(6):1243-1245.
    [35] Hou H S. A Fast Recursive Algorithm for Computing the Discrete CosineTransform[J]. IEEE Transactions on Acoustics, Speech and Signal Processing,1987,35(10):1455-1461.
    [36] Püschel M, Moura J M F. The Algebraic Approach to the Discrete Cosine and SineTransforms and Their Fast Algorithms[J]. SIAM Journal on Computing,2003,32(5):1280-1316.
    [37] Püschel M. Cooley-Tukey FFT Like Algorithms for the DCT[C]//IEEEInternational Conference on Acoustics, Speech, Signal Processing,2003,2:501-504.
    [38] Püschel M, Moura J M F. Algebraic Signal Processing Theory: Cooley-Tukey TypeAlgorithms for DCTs and DSTs[J]. IEEE Transactions on Signal processing,2008,56(4):1502-1521.
    [39] Kober V. Fast Recursive Algorithm For Sliding Discrete Sine Transform[J].Electronics Letters,2002,38(25):1747-1748.
    [40] Chen C H, Liu B D, Yang J F. Direct Recursive Structures for Computing Radix-RTwo-Dimensional DCT/IDCT/DST/DST[J]. IEEE Transactions on Circuits andSystems I: Regular Papers,2004,51(10):2017-2030.
    [41] Chiper D F, Swamy M N S, Ahmad M O, et al. A Systolic Array Architecture forthe Discrete Sine Transform[J]. IEEE Transactions on Signal Processing,2002,50(9):2347-2354.
    [42] Kutz G, Ur H. Improved DCT-DST Prime Factor Algorithms[J]. Signal Processing,2001,81(2):335-343.
    [43] Pei S C, Kao M P. Direct N-point DCT Computation from Three AdjacentN/3-point DC Coefficients[J]. IEEE Signal Processing Letters,2005,89(2):89-92.
    [44] Bi G. Fast Algorithms for Type-III DCT of Composite Sequence Lengths[J]. IEEETransactions on Signal Processing,1999,47(7):2053-2059.
    [45] Hsu H, Liu C. Fast Radix-Q and Mixed-radix Algorithms for Type-IV DCT[J].IEEE Signal Process. Letters,2008,15:910-913.
    [46] Lee B. FCT-A Fast Cosine Transform[C]//IEEE International Conference onAcoustics, Speech, Signal Processing (ICASSP).1984,9:477-480.
    [47] Yip P, Rao K R. Fast Decimation-in-time Algorithms for a Family of Discrete Sineand Cosine Transforms[J]. Circuits systems and signal Process,1984,3(4):387-408.
    [48] Yip P, Rao K R. The Decimation-in-frequency Algorithms for A Family of DiscreteSine and Cosine Transforms[J]. Circuits systems and signal Process,1988,7(1):3-19.
    [49] Murthy N R, Swamy M N S. On the On-Line Computation of DCT-IV and DST-IVTransforms[J]. IEEE Transactions on Signal Processing,1995,43(5):1249-1251.
    [50] Kober V. Efficient Algorithms for Running Type-I and Type-III Discrete SineTransforms[J]. IEICE Transactions on Fundamentals of ElectronicsCommunications and Computer Sciences,2004,87(3):761-763.
    [51] Kober V. Fast Algorithms for the Computation of Sliding Discrete SinusoidalTransforms[J]. IEEE Transactions on Signal Processing,2004,52(6):1704-1710.
    [52] Nikara J A, Takala J H, Astola J T. Discrete Cosine and Sine Transforms-RegularAlgorithms and Pipeline Architectures[J]. Signal Processing,2006,86(2):230-249.
    [53] Britanak V. New Recursive Fast Radix-2Algorithm for the Modulated ComplexLapped Transform[J]. IEEE Transactions On Signal Processing,2012,60(12):6703-6707.
    [54] Wang Z. Pruning the Fast Discrete Cosine Transform[J]. IEEE Transactions onCommunications,1991,39(5):640-643.
    [55] Cham W K. Development of Integer Cosine Transforms by the Principle of DyadicSymmetry[J]. IEE Proceedings on Communications, Speech and Vision,1989,136(4):276-282.
    [56] Cheng L Z, Xu H, Luo Y. Integer Discrete Cosine Transform and Its FastAlgorithm[J]. Electronics Letters,2001,37(1):64-65.
    [57] Zeng Y, Cheng L C, Bi G, et al. Integer DCTs and Fast Algorithms[J]. IEEETransactions on Signal Processing,2001,49(11):2774-2782.
    [58] Loeffler C, Ligtenberg A, Moschytz G. Practical Fast1-D DCT Algorithms with11Multiplications[C]//International Conference on Acoustics, Speech, and SignalProcessing (ICASSP). Glasgow, Glasgow, Scotland, UK,1989,2:988-991.
    [59] Madisetti A, Willson A N. A100MHZ2-D8×8DCT/IDCT Processor for HDTVApplications[J]. IEEE Transactions on Circuits System Video Technology,1995,5(2):158-164.
    [60] Guo J, Liu C M, Jen C W. The Efficient Memory-Based VLSI Array Design forDFT and DCT[J]. IEEE Transactions on Circuits and Systems II: Analog andDigital Signal Processing,1992,39(10):723-733.
    [61] Meher P K. Unified DA-Based Parallel Architecture for Computation the DCT andDST[C]//Fifth International Conference on Information, Communications andSignal Processing. Bankok, Thailand,2005:1278-1282.
    [62]闫宇松,石青云.可逆的整型DCT变换与无失真图像压缩[J].软件学报,2000,11(5):620-627.
    [63] Kung H T. Why Systolic Architecture[J]. Computer,1982,15(1):37-46.
    [64] Murthy N R, Swamy M N S. On the Real-time Computation of DFT and DCTThrough Systolic Architectures[J]. IEEE Transactions on Signal Processing,1994,42(4):988-991.
    [65] Fang W H, Wu M L. An Efficient Unified Systolic Architecture for theComputation of Discrete Trigonometric Transforms[C]//IEEE InternationalSymposium on Circuits and Systems (ISCAS).1997,3:2092-2095.
    [66] Cheng C, Parhi K K. A Novel Systolic Array Structure for DCT[J]. IEEETransactions on Circuits Systems II, Express Briefs,2005,52(7):366-369.
    [67] Meher P K, Swamy M N S. New Systolic Algorithm and Array Architecture forPrime-length Discrete Sine Transform[J]. IEEE Transactions on Circuits SystemsII, Express Briefs,2007,54(3):262-266.
    [68] Zhu P P, Liu J G, Dai S K. Fixed-point IDCT Without Multiplications Based onB.G. Lee’s Algorithm[J]. Digital Signal Processing,2009,19(4):770-777.
    [69] Meher P K,Valls J, Juang T B, et al.50Years of CORDIC: Algorithms,Architectures, and Applications[J]. IEEE Transactions on Circuits System I,2009,56(9):1893-1907.
    [70] Mariatos E P, Metafas D E, Hallas J A, et al. A Fast DCT Processor Based onSpecial Purpose CORDIC Rotators[C]//IEEE International Symposium on Circuitsand Systems (ISCAS). London, UK,1994:271-274.
    [71] Sung T Y, Shieh Y S, Hsin H C. Memory Efficiency and High-Speed Architecturesfor Forward and Inverse DCT with Multiplierless Operation[C]//Proceedings of theFirst Pacific Rim conference on Advances in Image and Video Technology,Heidelberg, Germany,2006:802-811.
    [72] Hu Y H, Wu Z. An Efficient CORDIC Array Structure for the Implementation ofDiscrete Cosine Transform[J]. IEEE Transactions on Signal Processing,1995,43(1):331-336.
    [73] Parfieniuk M, Vashkevich M, Petrovsky A. Short-critical-path and StructurallyOrthogonal Scaled CORDIC-based Approximations of the Eight-point DiscreteCosine Transform[J]. IET Circuits, Devices&Systems,2013,7(3):150-158.
    [74] Meher P K, Park S Y, Mohanty B K, et al. Efficient Integer DCT Architectures forHEVC[J]. IEEE Transactions On Circuits And Systems For Video Technology,2014,24(1):168-178.
    [75]彭永克.基于SIMD架构的二维DCT/IDCT变换电路模块的设计与实现[D].上海:上海交通大学硕士论文,2008:14-16
    [76] Meher P K. Unified Systolic-like Architecture for DCT and DST Using DistributedArithmetic[J]. IEEE Transactions on Circuits System I,2006,53(12):2656-2663.
    [77] Guo J I, Li C C. A Generalized Architecture for the One-dimensional DiscreteCosine and Sine Transforms[J]. IEEE Transactions On Circuits And Systems ForVideo Technology,2001,11(7):874-881.
    [78] Das B, Banerjee S. Unified CORDIC-based Chip to RealiseDFT/DHT/DCT/DST[J]. IEE Proceedings Computers and Digital Techniques,2002,149(4):121-127.
    [79] Pan S B, Park R H. Unified Systolic Arrays for Computation of theDCT/DST/DHT[J]. IEEE Transactions On Circuits And Systems For VideoTechnology,1997,7(2):413-419.
    [80] Chen C H, Liu B D, Yang J F. Direct Recursive Structures for Computing Radix-QTwo-Dimensional DCT/IDCT/DST/IDST[J]. IEEE Transactions on CircuitsSystem I,2004,51(10):2017-2030.
    [81] Chiper D F, Swamy M N S, Ahmad M O, et al. Systolic Algorithms and AMemory-Based Design Approach for A Unified Architecture for the Computationof DCT/DST/IDCT/IDST[J]. IEEE Transactions on Circuits System I,2005,52(6):1125-1137.
    [82] Gupta K, Nooshabadi A, Taubman S. Efficient Interfacing of DWT and EBCOT inJPEG2000[J]. IEEE Transactions on Circuits and Systems for Video Technology,2008,18(5):687-693.
    [83] Park J, Choi J H, Roy K. Dynamic Bit-Width Adaptation in DCT An Approach toTrade Off Image Quality and Computation Energy[J]. IEEE Transactions On VeryLarge Scale Integration (VlSI) Systems,2010,18(5):787-793.
    [84] Jessintha D, Reghu C, Raj A V. Energy Efficient, Architectural Reconfiguring DCTImplementation of JPEG Images Using Vector Scaling[C]//InternationalConference on Signal and Image Processing. Chennai, India,2010:59-42.
    [85] Jeong H, Kim J, Cho W K. Low-power Multiplierless DCT Architecture UsingImage Data Correlation[J]. IEEE Transactions on Consumer Electronics,2004,50(1):262-267.
    [86] Pai C Y, Lynch W E, Al-Khalili A J. Low-Power Data-Dependent8×8DCT/IDCTfor Video Compression[J]. IEE Procceedings Vision, Image and Signal Processing,2003,150(4):245-255.
    [87] Luo Z Y, Song L, Zheng S B, Ling N. H.264/Advanced Video Control PerceptualOptimization Coding Based on JND-Directed Coefficient Suppression[J]. IEEETransactions on Circuits and Systems for Video Technology,2013,23(6):935948.
    [88] Jung S H, Mitra S K. Subband DCT: Definition, Analysis, and Applications[J].IEEE Transactions on Circuits and Systems for Video Technology,1996,6(3):273-286.
    [89] Jayant N, Johnston J, Safranek R. Signal Compression Based on Models of HumanPerception[J]. Proceedings of the IEEE,1993,81(10):1385-1422.
    [90] Tan E L, Gan W S, Wong M T. Perceptually Lossless Coder Based on JustNoticeable Distortion Profile with Subband DCT[C]//Proceedings of the NinthInternational Symposium on Consumer Electronics (ISCE2005).2005:253-257.
    [91] Wei Z Y, Ngan K N. Spatio-Temporal Just Noticeable Distortion Profile for GreyScale Image/Video in DCT Domain[J]. IEEE Transactions on Circuits and Systemsfor Video Technology,2009,19(3):337-346.
    [92] Hsiao S F, Hu Y H, Juang T B, et al. Efficient VLSI Implementations of FastMultiplierless Approximated DCT Using Parameterized Hardware Modules forSilicon Intellectual Property Design[J]. IEEE Transactions On Circuits AndSystems-I: Regular Papers,2005,52(8):1568-1579.
    [93] Hou H S. A Fast Recursive Algorithm for Computing the Discrete CosineTransform[J]. IEEE Transactions on Acoustics, Speech and Signal Processing,1987,35(10):1445-1461.
    [94] Kok C W. Fast Algorithm for Computing Discrete Cosine Transform[J]. IEEETransactions on Signal Processing,1997,45(3):757-760.
    [95] Kar D C, Bapeswara Rao V V. A CORDIC Based Unified Systolic Architecture forSliding Window Applications of Discrete Transforms[J]. IEEE Transactions onSignal Processing,1996,44(2):441-444.
    [96] Yang J F, Fang C P. Compact Recursive Structures for Discrete CosineTransform[J]. IEEE Transactions on Circuits Systems II: Analog Digital SignalProcessing,2000,47(4):314-321.
    [97] Cai X, Lim J S. Algorithms for Transform Selection in Multiple-transform VideoCompression[C]//19th IEEE International Conference on Image Processing (ICIP),Orlando, USA,2012:2481-2484
    [98] Saxena A. DCT/DST-Based Transform Coding for Intra Prediction in Image/VideoCoding[J]. IEEE Transactions on Image Processing,2013,22(10):3974-3981.
    [99] Han J. N., Saxena, A., Melkote, V., Rose, K. Jointly Optimized Spatial Predictionand Block Transform for Video and Image Coding[J]. IEEE Transactions on ImageProcessing,2012,21(4):1874-1884.
    [100] Zhu J, Liu Z Y, Wang D S. Fully Pipelined DCT/IDCT/Hadamard UnifiedTransform Architecture for HEVC Codec[C]//2013IEEE International Symposiumon Circuits and Systems (ISCAS). Beijing, China,2013:677-680.
    [101] Ahmed A, Awais M, Maurizio M, et al. VLSI Implementation of16-point DCTfor H.265/HEVC Using Walsh Hadamard Transform and Lifting Scheme[C]//IEEE14th International in Multitopic Conference (INMIC). Karachi, Pakistan,2011:144-148.
    [102] Salam B, Michael C. A Hybrid Image Compression Technique Based on DWTand DCT Transforms[C]//International Conference on Advanced InfocomTechnology (ICAIT), Wuhan, China,2011:1-8.
    [103] Barsanti R. J., Athanason A. Signal Compression Using the Discrete WaveletTransform and the Discrete Cosine Transform[C]//2013Proceedings of IEEESoutheastcon, Jacksonville, USA,2013:1-5.
    [104] Huang J., Lee J. A Self-Reconfigurable Platform for Scalable DCT ComputationUsing Compressed Partial Bitstreams and Blockram Prefetching[J]. IEEETransactions on Circuits and Systems for Video Technology,2009,19(11):1623-1632.
    [105] Park J., Choi J. H., Roy K. Dynamic Bit-Width Adaptation in DCT: AnApproach to Trade off Image Quality and Computation Energy[J]. IEEETransactions on Very Large Scale Integration (VLSI) Systems,2010,18(5):787-793.
    [106] Jiang Y. B., Pattichis M. Dynamically Reconfigurable DCT Architectures Basedon Bitrate, Power, and Image Quality Considerations[C]//IEEE InternationalConference on Image Processing (ICIP), Orlando, USA,2012:2465-2468.
    [107] Bae S. H., Kim M. A novel DCT-Based JND Model for Luminance AdaptationEffect in DCT Frequency[J]. IEEE signal processing letters,2013,20(9):893-896.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700