高性能、时钟精确C67X DSP指令模拟技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
DSP是一类解决计算密集型问题的高性能处理器,被广泛地应用到嵌入式系统的诸多领域,例如音视频编解码、图像分析与处理等。随着技术的发展,DSP硬件结构、指令集和流水线的复杂度不断提升,导致研制高精确高性能DSP模拟平台的难度不断增大。本文集中探讨了针对TMS320C67X系列VLIW架构流水线的模拟策略,以及实现中的性能优化技术。
     本文首先从VLIW架构特性出发,针对延迟槽模拟、流水线停顿模拟等流水线顺序模型的缺陷进行分析,然后选用流水线倒序模型,并以此为基础对流水线进行精确模拟。流水线倒序模型按照逆序对流水线各阶段进行串行化模拟,即先模拟指令执行阶段,后模拟指令获取阶段的方式,以此解决顺序模型的缺陷,提高流水线模拟精度。进而,分析了流水线模型中存在的性能瓶颈,提出采用指令译码缓存和指令执行信息环形队列技术进行性能优化。在此基础上,以优化后的流水线倒序模型为核心,设计实现了一个C67X指令模拟实验平台TIC67Xsim,具有:指令模拟、内存模拟、寄存器模拟、目标文件加载等功能。
     本文实验选用Whetstone Benchmark、Dhrystone Benchmark和切比雪夫低通数字滤波器算法作为实验模拟平台测试用例。实验结果表明,本文论述的实验模拟平台能够正确模拟C67X指令集,并具有较高性能,可作为验证应用程序、扩展自定义功能的实验模拟平台。
Currently, DSP as a solution to the high-performance processor-intensive computing is increasingly being applied to many fields of embedded systems. With technological development, DSP hardware architecture, instruction set and pipeline becomes more and more complex, which brings simulation lots of difficulties. This paper focuses on the simulation of TMS320C67X's pipeline and the improvement of simulation performance.
     The paper firstly introduces the characteristics of VLIW architecture, and then analyzes the defects of pipeline sequence simulation model in delay slots, pipeline stalls and so on. So we use pipeline reverse order simulation model for instruction set simulation. The model firstly simulates the function of instruction execution stages, and the function simulation of instruction fetch stages will be happened at last. We will also presentation the reason and the benefits of the model. Furthermore, the paper analyses the performance bottlenecks that may exist in implementation of the model. Based on the analysis, we design two mechanisms to optimize the performance of pipeline model. According to the hardware feature, we design and implement the instruction simulation module, memory module, register module and files load module, together with the optimization module composed of entire platform.
     At last we use Whetstone benchmark, Dhrystone benchmark and Chebyshev low-pass digital filter algorithm as test suite. The result shows that not only the platform can simulate the instruction set correctly, but also has high performance. It is an experimental platform which particularly suitable to verify application and to expand custom features.
引文
[1]汪安民TMS320C54xx DSP实用技术[M].北京:清华大学出版社,2002.
    [2]陈铮,彭晓源.基于DSP平台的景象匹配算法评估环境[J].计算机工程,2007,33:274-279.
    [3]姜衡,张兆扬,张颖,石旭利.基于C6x DSP的MPEG-4视频编码器的设计优化[J].上海大学学报(自然科学版),2004,10:341-345.
    [4]李乐虎.基于DSP的视频运动目标检测与跟踪系统的研究[J].黑龙江科技信息,2008,26:4,60.
    [5]许少尉,梁争争,蒋谢刚.某弹载计算机GJB1188A接口的设计与实现[J].航空计算技术,2008,38:119-117.
    [6]陆朋,禹卫东.基于TMS320C6701的机载SAR方位向处理的实时[J].遥感技术与应用,2003,18:322-325.
    [7]喻之斌,金海,邹南海.计算机体系结构软件模拟技术[J].软件学报,2008,19:1051-1068.
    [8]Young-Ran Lee, Sang-Young Cho, Jeong-Bae Lee. The design a virtual prototyping based on ARMulator[J]. Computer and Information Science,2005.
    [9]Binkert, Dreslinski, Hsu, Lim, Saidi, Reinhardt. THE M5 SIMULATOR: MODELING NETWORKED SYSTEMS[C]. Micro,2006.
    [10]Tang Lei, Yang Yanhui, Wei Shaojun. Optimizing SoC platform architecture for multimedia applications[C]. ASIC,2005.
    [11]Texas Instruments Incorporated. TMS320C67x/C67x+ DSP CPU and Instruction Set Reference Guide [DB/OL]. [2006-05]. http://focus.ti.com/lit/ug/spru733a/spru733a.pdf
    [12]Leupers, Elste, Landwehr. Generation of interpretive and compiled instruction set simulators[C]. The Asia South Pacific Design Automation Conference,1999.
    [13]Mills, Ahalt, Fowler. Compiled instruction set simulation[C]. Software-Practice and Experience,1991,21:877-889.
    [14]Reshadi, Mishra, Dutt. Instruction Set Compiled Simulation:A Technique for Fast and Flexible Instruction Set Simulation[C]. The 40th conference on Design automation,2003.
    [15]杨义彬,蒋烈辉,尹青,何红旗,宋彭涛.面向多目标的指令集模拟技术[J].计算机工程,2009,35:284-285.
    [16]Zhu J, Gajski D. A retargetable, ultra-fast, instruction set simulator[C]. Design Automation and Test,1999.
    [17]Emmertt Witchel, Mendel Rosenblum. Embra, fast and flexible machinesimulation[C]. ACM SIGMETRICS performance Evaluation Review, 1996,24:68-79.
    [18]Texas Instruments Incorporated. TMS320C6000 Instruction Set Simulator Technical Reference Manual [DB/OL]. [2006-11]. http://focus.ti.com/lit/ug/spru600i/spru600i.pdf
    [19]Analog Devices Incorporated. Visual DSP++ 5.0 User's Guide [DB/OL]. [2007-08]. http://www.analog.com/static/imported-files/software_manuals/719705850_ug.p df
    [20]Peter Magnusson, Magnus Christensson, Jesper Eskilson, Daniel Forsgren, Gustav Hallberg, Johan Hogberg, Fredrik Larsson, Andreas Moestedt, Bengt Werner. Simics:A Full System Simulation Platform[C]. IEEE,2002,35(2): 50-58.
    [21]Kerry. gDSPsim [EB/OL]. [2009-07-17]. http://gdspsim.sourceforge.net/
    [22]Barbieri, Bariani, Raggio. A VLIW architecture simulator innovative approach for HW-SW co-design[C]. Multimedia and Expo,2000.
    [23]Hayes. C4xGDB Simulator [EB/OL]. [1998]. http://www.elec.canterbury.ac.nz/c4x/c4x-gdb.html
    [24]Hong Hao, Bracewell. A three-dimensional DFT algorithm using the fast Hartley transform[C]. IEEE,1987:264-266.
    [25]Zhe Zhang, Xiaoming Hu, Linxiang Shi. High-Performance Instruction-Set Simulator for TMS320C62x DSP[C]. Industrial Mechatronics and Automation, 2010.
    [26]Waheed Uz Zaman Bajwa, Hafiz Abid Qadeer, Mudassar Farooq. Object-Oriented Design of a Cycle Accurate Re-Comfigurable Simulator Toolkit for DSP Processors[C]. IEEE,2001:10-15.
    [27]Naser Sedaghati-Mokhtari, Mahdi Nazm-Bojnordi, Abbas Hormati, Sied Mehdi Fakhraie. An Efficient and Extendable Modeling Approach for VLIW DSP Processors [J]. Communications in Computer and Information Science,2009,6: 267-274.
    [28]Reshadi, Gorjiara, Dutt. Generic Processor Modeling for Automatically Generating Very Fast Cycle-Accurate Simulators[C]. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems,2006,25: 2904-2918.
    [29]Hyeong Bae Park, Tae Hoon Kim, Chang Won Ryu, Hua Jun Chi, Ju Sung Park. A cycle accurate model for a DSP[C]. Strategic Technology,2006.
    [30]王琦,顾瑜,汪东升.高性能可重构指令集架构模拟技术[J].清华大学学报,2006,26:90-93.
    [31]刘航.基于C54x系列DSP的程序优化方法研究[J].电信交换,2008:6-13.
    [32]Joao Leonardo Carmol, Carlos Videira2, Alberto Rodrigues da Silva. Using Visual Studio Extensibility Mechanisms for Requirements Specification[C]. International Conference on Innovative Views of the.Net Technology,2005: 45-57.
    [33]Price, W.J. A benchmark tutorial[C]. IEEE,1989,9:28-43.
    [34]Reinhold P. Weicker. Dhrystone:a synthetic systems programming benchmark[C]. Communications of the ACM,1984.
    [35]饶志强,叶念渝.FIR和IIR数字滤波器的探讨与实现[J].计算机与数字工程,2005,7:143-146.
NGLC 2004-2010.National Geological Library of China All Rights Reserved.
Add:29 Xueyuan Rd,Haidian District,Beijing,PRC. Mail Add: 8324 mailbox 100083
For exchange or info please contact us via email.