面向图像处理的可配置处理器设计与实现
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
可配置处理器可以针对具体应用做出配置,得到不同运算性能的硬件电路,并且具有可编程性,在SoC设计中,用可配置处理器完成数据密集型的运算任务时,比通用微处理器具有更强的计算能力,比ASIC(Application Specific Integrated Circuit)具有更大的灵活性,可加快开发周期。本论文设计了一个面向图像处理的可配置处理器模板——T*Core,该处理器是根据传输触发架构(TTA,Transport-Triggered Architecture)设计。使用时,根据特定图像处理的应用,配置相关参数就可生成一款具体的T*Core处理器硬件电路。
     本论文对图像处理中几种基本的算法进行了分析,包括图像增强常用的卷积滤波和中值滤波算法,图像压缩常用的离散余弦变换,以及图像的缩放算法,并总结出这些算法的特点,为T*Core处理器功能单元的定制提供依据。本论文对T*Core处理器的设计进行了详细说明,包括T*Core内部结构、指令格式与流水线、数据通路的构成、各个功能单元的设计、立即数存放机制,以及T*Core内部资源的编址等。功能单元作为T*Core处理中的核心运算组件,直接影响着处理器计算性能,其结构根据图像处理算法的特点来设计,如带排序功能的加减法功能单元、浮点乘累加功能单元、带二维寻址功能的存储器访问功能单元以及无延时跳转控制功能单元,都将提升图像处理程序运行的速度。
     硬件验证时,本论文采用C*Core C310作为主处理器,T*Core作为从处理器搭建了一个硬件SoC验证平台,系统工作主频为30MHz,图像处理的结果最终显示在QVGA屏幕上,结果说明T*Core能正确地完成图像处理的功能。并将T*Core可配置处理器与通用型微处理器C*Core C310和ARM926EJ进行速度对比,结果说明在完成同等计算量的任务时,T*Core执行速度比通用微处理器要高出很多。
Configurable processor is a special processor, which will be implemented in hardware circuit by the configuration according to the application. The computing performance of configurable processor is better than general-purpose processor. And configurable processor is more flexible than ASIC (Application Specific Integrated Circuit) because of its programmability. In this thesis, a template of configurable processor named T*Core is designed for image processing. T*Core processor is designed based on Transport Triggered Architecture. A specific processor will be implemented by the configuration according to the application of image processing.
     This thesis analyzes several algorithms of image processing, including image convolution filtering, median filtering, discrete cosine transform (DCT), and image scaling. Details of design of T*Core processor are illustrated in this thesis, including the architecture of the processor, instruction format and pipeline, data path, function units (FU), immediate data, and addressing of internal resources. Function units are the most important computing components of T*Core processor, which influence the performance of the processor a lot. The circuit construction of function unit is customized according to the algorithms of image processing. Several function units are customized in this thesis, such as addition and subtraction FU with sorting function, multiply-accumulate FU of floating point operation, load-store FU with two dimension accessing of data memory, jump control FU without delay. When image processing programs are running, the speed will be increased by using the customized function units.
     In the hardware verification, a SoC platform is built, which works under the frequency of 30MHz. In this SoC platform, C*Core C310 is used as the main processor and T*Core as the co-processor. The image after processing by T*Core is displayed on the QVGA screen, and the result shows that T*Core can execute the image processing program correctly. The same image processing programs are also run at general-purpose microprocessors C*Core C310 and ARM926EJ, and the result shows that T*Core is faster than general-purpose microprocessors when executing the same computing program.
引文
[1]李广军,阎波.微处理器系统结构与嵌入式系统设计[M].北京:电子工业出版社, 2009: 23-27.
    [2] J. Sifakis. Embedded systems design - Scientific challenges and work directions [C]. Design, Automation and Test in Europe Conference, 2009: 2-2.
    [3] Chris Rowen(美)著.吴武臣,侯立刚译.复杂SOC设计[M].北京:机械工业出版社,2006, 08: 3-40, 273-304.
    [4]郭炜,郭筝,谢憬. SoC设计方法与实现[M].北京:电子工业出版社, 2007: 3-13.
    [5] Henry Chang, Larry Cooke, et al. Surviving the SoC Revolution: a Guide to Platform-Based Design [M]. Kluwer Academic Publishers, USA, 1999: 29-60.
    [6] Grant Martin. Recent Developments in Configurable and Extensible Processors [C]. International Conference on ASAP, 2006: 39-44.
    [7] D.Goodwin, C.Rowen, G.Martin. Configurable Multi-Processor Platforms for Next Generation Embedded Systems [C]. ASP-DAC, 2007: 744-746.
    [8] J.Nurmi, S.Leibson, F.Campi, et al. Extensible and Configurable Processors for System-on-Chip Design[J]. Advanced Signal Processing, Circuits, and System Design Techniques for Communications, 2006: 45-97.
    [9] D.Stevens, N.Glynn, P.Galiatsatos, et al. Evaluating the performance of a configurable, extensible VLIW processor in FFT execution [C]. 6th IEEE International Conference on ICECS, 2009: 771-774
    [10] V.A.Chouliaras, T.R.Jacobs, A.K.Kwnaraswamy. Configurable multiprocessors for high-performance MPEG-4 video coding [J]. IEEE Computer Society Annual Symposium on VLSI, 2005: 272-273.
    [11] R.E.Gonzalez, Xtensa: a configurable and extensible processor [J]. IEEE Micro, 2000: 60-70.
    [12] Sangkwon Na, Seungrok Jung, Chong-Min Kyung. Multimedia application extension processor (MAEP) [C]. International SoC Design Conference, 2008, 03: III58-III59.
    [13] T.Tohara. A New Kind of Processor Interface for a System-on-Chip Processor with TIE Ports and TIE Queues of Xtensa LX [J]. Innovative Architecture for Future Generation High-Performance Processors and Systems, 2005: 72-79.
    [14] H.Tsutsui, T.Masuzaki, T.Izumi, et al. High speed JPEG2000 encoder by configurable processor [C]. Asia-Pacific Conference on Circuits and Systems, 2002: 45-50.
    [15] N.R.Potlapally, S.Ravi, A.Raghunathan, et al. Configuration and Extension of Embedded Processors to Optimize IPSec Protocol Execution [J]. IEEE Transactions on VLSI Systems, 2007, 15(5): 605-609.
    [16] Tensilica [EB/OL]. http://www.tensilica.com/
    [17] ARC International [EB/OL]. http://www.arc.com/
    [18] Edwin van Dalen, Santiago Gonzalez Pestana, Antoine van Wel. An Integrated, Low-Power Processor for Image Signal Processing [J]. Eighth IEEE International Symposium on Multimedia, 2006: 501-508.
    [19] Silicon Hive [EB/OL]. http://www.siliconhive.com/
    [20] H. Corporaal, Mulder, Hans. MOVE: a framework for high-performance processor design [C]. ACM/IEEE Conference on Supercomputing, 1991: 692-701.
    [21] Tabak, D. Lipovski, G.J. MOVE Architecture in Digital Controllers [J]. IEEE Journal of Solid- State Circuits, 1980, 15(1): 116-126.
    [22] H. Corporaal. A different approach to high performance computing[C]. Fourth International Conference on High-Performance Computing, 1997: 22-27.
    [23] H.Corporaal. Design of transport triggered architectures [J]. Fourth Great Lakes Symposium on VLSI, 1994: 130-135.
    [24]赵学秘,王志英,岳虹等. TTA-EC:一种基于传输触发体系结构的ECC整体算法处理器[J].计算机学报, 2007, 30(2): 225-233.
    [25] A.Burian, P.Salmela, J.Takala. Complex fixed-point matrix inversion using transport triggered architecture [C]. 16th IEEE International Conference on ASAP, 2005: 107-112.
    [26] P.Salmela, T.Jarvinen, J.Takala, et al. Scalable FIR filtering on transport triggered architecture processor[J]. International Symposium on Signals, Circuits and Systems. 2005, 2: 493-496.
    [27] P.Hamalainen, J.Heikkinen, M.Hannikainen, et al. Design of transport triggered architecture processors for wireless encryption[C]. 8th Euromicro Conference on Digital System Design, 2005: 144-152.
    [28] Pitkanen T, Makinen R, et al. Transport Triggered Architecture Processor for Mixed-Radix FFT[C]. Fortieth Asilomar Conference on Signals, Systems and Computers, 2006: 84-88.
    [29]张功萱,顾一禾,邹建伟等.计算机组成原理[M].清华大学出版社, 2005.09: 190-217.
    [30] John L.Hennessy(美), David A. Patterson(美)著.白跃彬译.计算机系统结构—量化研究方法(第四版)[M].北京:电子工业出版社, 2007.08: 45-128.
    [31] Rafael C. Gonzalez(美), Richard E. Woods(美)著.阮秋琦等译.数字图像处理[M].北京:电子工业出版社, 2003.03: 59-112.
    [32] N.Ahmed, T.Natarajan, K.R.Rao. Discrete Cosine Transform [J].IEEE Transactions on Computers, 1974, C-23(1): 90-93.
    [33] Wen-Hsiung Chen, C. Smith, S. Fralick. A Fast Computational Algorithm for the Discrete Cosine Transform [J]. IEEE Transactions on Communications, 1997, 25(9): 1004-1009.
    [34] C.Loeffler, A.Ligtenberg, G.S.Moschytz. Practical fast 1-D DCT algorithms with 11 multiplications [C]. ICASSP-89, 1989: 988-991.
    [35] I.Andreadis, A.Amanatiadis. Digital Image Scaling [C]. Instrumentation and Measurement Technology Conference 2005, 3: 2028-2032.
    [36] C*Core [EB/OL]. http://www.china-core.com/
    [37] C*Core C310用户手册[K].苏州国芯科技有限公司,2003.
    [38] Virtex-4 FPGA User Guide [K]. Xilinx, 2008.
    [39] LQ057Q3DC12 Datasheet [K]. SHARP, 2004.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700