面向图像处理的异构多核仿真系统研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
图像信息在人类生活和社会发展中发挥着越来越重要的作用,提高图像处理速度已经成为各个领域要解决的热点问题之一。专用处理器核是解决图像处理问题的有效手段,但由于任务缺乏并行,不可扩展等缺点,致使许多图像处理系统难于满足人们对视频图像不断增长的业务需求和应用需求。针对这一问题,本文设计实现了面向图像处理的异构多核处理器仿真平台系统。
     本文首先分析了计算机仿真技术、专用指令集定制技术和多核通信机制,提出了一种基于集束分类模型的指令集定制方法,主要采用集束分类方法把候选指令集进行分类来解决专用指令集定制过程中搜索空间较大的问题。在对图像处理算法分析过程中,运用该指令集定制方法对图像处理算法中的一些高频和耗时操作定制为专用指令,利用Open Virtual Platform(OVP) API对OVPsim中现有处理器进行指令扩展并实现了一款面向图像处理的专用指令集处理器核。实验结果表明专用指令集处理器核比原处理器核在性能上有较大的提高。
     其次,本文以专用指令集处理器核为辅助核,OVPsim多核模拟器中的MIPS处理器核为主核,实现了一款面向图像处理的异构多核仿真平台。本文针对图像处理算法特性研究了异构多核处理器通信机制,设计了一种基于通信控制单元(Communication Control Unit,CCU)的多核通信模型,采用信箱模块和DMA模块来加速异构多核处理器对图像处理过程中的通信密集型任务和计算密集型任务的处理。本文利用OVP的BeHavior Model(BHM)和Peripheral Programming Model(PPM) API实现了通信控制单元并对异构多核仿真平台进行了扩展。经过通信模型对比实验表明,采用通信控制单元的异构多核通信方案比经典的CELL异构多核通信方案在效率上提高了11.3%。
     最后本文分析了基于缝隙分析的快速纸张计数系统中的图像处理算法,以该系统中图像处理耗时模块为程序测试集,如中值滤波、形态学运算等,在本文设计实现的面向图像处理的多核异构仿真平台上进行仿真实验。分别采用单核、同构多核和异构多核三种结构对测试集进行了实验,实验表明本文设计实现的可扩展的异构多核仿真平台能较大地改善对图像处理的速度。
Image information plays an increasingly important role in human life and social development, improving the speed of image processing has become one of hot problems needed to solve in many fields. Image processor is an effective means to solve the problem of image processing, but due to shortcomings that tasks can't extended and be parallel, many image processing systems are difficult to meet people's business demands and applications demands for video and image. According to the problems, this thesis designs and implements a heterogeneous multi-core simulation platform for image processing.
     Firstly, this thesis analyzes computer simulation technology, special instruction set customization technology and multi-core communication mechanism, and a method of instruction set customization based on cluster classification is proposed, which mainly adopts cluster classification method to classify the basic instruction set for the problem of big search space in the process of special instruction set customization. In analysis process of image processing algorithm, high frequency and time-consuming operations of the image processing algorithm are designed for special instructions using the method of instruction set customization. This thesis expands instruction set of MIPS processor and realizes a special instruction set processor core for image processing using API of Open Virtual Platform (OVP). Experimental results show that the performance of the special instruction set processor core is greatly raised compared to the original processor core.
     This thesis selects MIPS processor as a main core, selects the specific instruction set processor core for the secondary, and realizes a heterogeneous multi-core simulation platform for image processing. The thesis studies communication mechanism of the heterogeneous multi-core processor aimed to the performance of image processing algorithm, and puts forward a multi-core communication model based on communication control unit, which uses mailbox module and DMA module to speed up processing communication-intensive tasks and compute-intensive tasks, then this thesis realizes the communication control unit and expands the heterogeneous multi-core simulation platform using BeHavior Model(BHM) and Peripheral Programming Model(PPM) API of OVP. Experiment results in multi-core communication model indicate that the communication mechanism based on CCU improves by 11.3% on efficiency compared to the communication mechanism of CELL
     Finally, this thesis analyzes image processing algorithms of the rapid paper counting system based on the gap analysis, takes time-consuming modules of image processing in the system as program test set, such as median filtering, morphological operation, etc, which is experimented in the heterogeneous multi-core simulation platform. In the structure of single, isomorphism multi-core and heterogeneous multi-core, experiments are done for testing the program set, experimental results show that the speed of image processing has greatly improved in the expandable heterogeneous multi-core simulation platform which is designed and implemented in the thesis.
引文
[1]李强.32位图像向量处理器关键技术研究与设计:[硕士学位论文].大连:大连理工大学,2009.
    [2]岳虹.嵌入式异构多核处理器设计与实现关键技术研究:[博士学位论文].长沙:国防科技大学,2008.
    [3]陈国兵.嵌入式异构多核体系的片上通信:[硕士学位论文].浙江:浙江大学,2006.
    [4]James SW Song, Pallas Yang, et al. Hierarchical Timing Closure Methodology For OMAPTM:An Open Multimedia Application Platform[J]. IEEE Trans. Circuits Syst. Video Technol,2003,4(8):238-241.
    [5]Kun-YuanHsieh, Yen-ChihLiu, Po-WenWu. Enabling Streaming Remoting on Embedded Dual-core Processors[C].37th International Conference on Parallel Processing,2008,20(2):18-24.
    [6]李兴红.基于OMAP的嵌入式系统开发[J].数据采集与处理,2008,23(5):181-184.
    [7]张立民,董伯青,由涛等.基于OMAP的嵌入式手持无线终端的研究与实现[J].南开大学学报(自然科学版),2007,240(5):80-84.
    [8]刘虎,陈启美.基于Cell的H.264关键算法的移植研究[J].仪表技术,2008,10(2): 9-15.
    [9]冯国富,董小社,丁彦飞.面向Cell宽带引擎架构的异构多核访存技术[J].西安交通大学学报,2009,43(2):1-5.
    [10]GSCHWIND M. Chip multiprocessing and the cell broadband engine [C]. Proceedings of ACM Computing Frontiers. New York, ACM Press,2006:19-25.
    [11]S. Dutta, R.Jensen, and A. Rieckmann. A Multiprocessor SOC for Advanced Set-Top Box and Digital TV Systems [J]. IEEE Design and Test of Computers, 2001,13(3):21-31.
    [12]李兴友,游志胜.基于Nexperia数字多媒体平台的设计与实现[J].四川大学学报(自然科学版),2003,40(5):858-863.
    [13]Kapasi, U. J, Dally, W. J. The Imagine Stream Processor[J]. VLSI in Computers and Processors,2002,16(8):282-288.
    [14]杨学军,曾丽芳Imagine流处理器上流的优化组织方法[J].计算机学报,2008,31(7):181-184.
    [15]Kapasi. U. J, Rixner. S et al. Programmable stream processor[J]. IEEE Computer,2003,36(8):54-62.
    [16]陈波.通用嵌入式微处理器仿真平台的研究与实现[D].厦门大学,2009.
    [17]高轶杰,郑扣根,冯骁斌.基于MIPSX的模拟器研究与移植[J].计算机工程,2007,33(4):55-57.
    [18]李飞跃.基于ARMulator的嵌入式uClinux软件开发环境[J].重庆文理学院学报(自然科学版),2008,27(5):39-41.
    [19]杨军,王镇.基于ARMulator扩展的AMBA总线及性能分析[J].微电子技术,2006,14(20):116-120.
    [20]蔡启先,李日初.DLX处理器浮点数流水线性能的研究[J].计算机工程,2006,32(9):222-224.
    [21]Ibrahim H A, Mohandes A M, Ragaie H F et al. Synthesis and Physical Design of DLX RISC Processor[C].16th National Radio Science Conference, 1999,30(4):22-24.
    [22]林明亮,祝永新.基于SimpleScalar的异构多核仿真器[J].微电子学与计算机,2007,24(7):204-208.
    [23]Manjikinn N. Multiprocessor enhancements of the SimpleScalar tool set[J]. ACM SIGARCII Computer Architecture News,2001,29(1):8-15.
    [24]Guang Huei Lin, Ruby B Lee. Memory Access Optimization of Motion Estimation Algorithms on a Native SIMD PLX Processor[J]. IEEE Transactions on Parallel and Distributed Systems,2006,6(12):566-569.
    [25]Larry Lapides. Welcome to the Open Virtual Platforms (OVP) portal[OL]. Imperas.2009. www. ovpworld. org.
    [26]A. Huffmann, T. Kogel. A Novel Methodology for the Design of Application Specific Instruction Set Processors (ASIP) Using a Machine description Language[J]. IEEE Transaction Computer-Aided Design,2001,20(11): 1338-1354.
    [27]H. Dawidand, H. Meyr. The Differential CORDIC Algorithm:Constant Scale Factor Redundant Implementation without correcting Iterations[J]. IEEE Transactions on Computers,1996,45(3):307-318.
    [28]T. M.Kemp, R. K. Montoye. A Decompression Core For PowerPC [J]. IBM Journal of Research and Development,1998,42(6):807-812.
    [29]崔光佐,程旭.面向处理器的系统级模拟、仿真及调试技术一基于软硬件协同设计的新方法[J].计算机研究与发展,2001,10(3):101-103.
    [30]郭晓东,刘积仁,余克清等.嵌入式系统虚拟开发环境的设计与实现[J].计 算机研究与发展,2000,37(4):412-413.
    [31]张鲁峰,赵文辉,李思昆.嵌入式微处理器的多层次可配置仿真工具[J].国防科技大学学报,2002,24(4):53-56.
    [32]岑健,邢镇容.随机线性系统模型的集结简化[J].计算机仿真,2006,12(2):53-56.
    [33]胡峰,孙国基,卫军胡.动态系统计算机仿真技术综述[J].计算机仿真2000,17(1):1-7.
    [34]方美琪,张树人.复杂系统建模与仿真[M].北京:中国人民大学出版社,2005.
    [35]肖田元,张燕云,陈加栋。系统仿真导论[M]。北京:清华大学出版社,2000.
    [36]Hubert Hahn. Mathematical Modeling, Control, Computer Simulation and Laboratory Experiments of a Spatial Servo pneumatic Parallel Robot:Part 1:Mathematical Models, Controllers, and Computer Simulations. Nonlinear Dynamics,2005,40(4):387-417.
    [37]韦有双,杨湘龙,王飞.虚拟现实与系统仿真[M].北京:国防工业出版社,2004.
    [38]Schwerte. General-purpose in-circuit emulator[J]. Electronic.1982 (19):65-68.
    [39]Rgincau, Saatci. Stand-Alone-Circuit emulator[J]. Microprocessing and Microprogramming.1986(3):159-167.
    [40]于明华,李弘.仿真技术的应用和发展[J],内蒙古科技与经济.2007.1(1):67-69.
    [41]李种武,魏华梁.谈数学模型及其简化.计算机仿真[J].1998,15(3):7-8.
    [42]刘藻珍,魏华梁.系统仿真.北京:北京理工大学出版社.1998.
    [43]Moudgill. Techniques for implementing fast processor simulators [J]. IEEE Simulation Symposium,1998,31(1):83-90.
    [44]J. P. Bennett. A Methodology for Automated Design of Computer Instruction Sets[D]. Cambridge:Computer Laboratory, University of Cambridge,1988.
    [45]M. Gschwind. Instruction set selection for ASIP design[C]. In ACM CODES Workshop. New York:ACM,1999:7-11.
    [46]吕雅帅.专用指令集处理器定制关键技术研究与实现:[博士学位论文].长沙:国防科学技术大学,2009.
    [47]S. Devadas, K. Keutzer, S. Tjiang. Instruction selection using binate covering for code size optimization[C]. International Conference on Computer-Aided Design. Washington, DC:IEEE Computer Society,1995: 393-399.
    [48]M. Arnold, H. Corporaal. Designing Domain Specific Processors[C]. In Proc. of CODES'01. New York:ACM,2001:61-66.
    [49]Kubilay Atasu, Laura Pozzi, Paolo Ienne. Automatic Application Specific Instruction-Set Extensions under Micro architectural Constraints [C]. Proc. Design Automation Conference. New York:ACM.2003: 256-261.
    [50]Jong-eun Lee, Kiyong Choi, Nikil Dutt. Efficient Instruction Encoding for Automatic Instruction Set Design of Configurable ASIPs[C]. Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design. New York:ACM,2002:649-654.
    [51]D. Pham et. al.. The Design and Implementation of a First-Generation CELL Processor[C]. International Solid-State Circuits Conference Technical Digest, San Francisco:IEEE Computer Society,2005:184-592.
    [52]Jason Cong, Yiping Fan,Guoling Hart. Application-specific instruction generation for configurable processor architectures[C]. Proc. Int'1 symposium on Field programmable gate arrays. New York:ACM,2004:183-189.
    [53]M. Kistler, M. Perrone, and E. Petrini. CELL Multiprocessor Communication Network:Built for Speed[J]. IEEE Micro.2006,26(3): 10-23.
    [54]汤子赢,汤小丹等.计算机操作系统[M],西安:西安电子科技大学出版社,2005.
    [55]谢子光.多核处理器核间通信技术研究:[硕士学位论文].四川:电子科技大学,2009.
    [56]Gorder. Multi-core Processors for Science and Engineering[J]. Computing in Science & Engineering,2007,9(2):3-7.
    [57]麻巍.专用视频处理器指令集研究与数据通路设计:[硕士学位论文].浙江:浙江大学电气工程学院,2008.
    [58]Kubilay Atasu. An integer linear programming approach for identifying instruction-set extensions[J]. CODES+ISSS'05 Proceedings of the 3rd IEEE/ACM/IFIP international conference. New York:ACM,2005,172-177.
    [59]苏光大.图像并行处理技术[M].北京:清华大学出版社,2002.
    [60]曾泳弘,成礼智,周敏.数字信号处理的并行算法[M].长沙:国防科技大 学出版社,1999.
    [61]李晓梅,莫则尧,胡庆丰等.可扩展并行算法的设计与分析[M].北京:国防工业出版社,2000.
    [62]黄士坦,王菲等.几种图像处理算法特性分析[J].第七届全国信号与信息处理联合会议暨首届全国省(市)级图像图形学会联合年会,2008.
    [63]刘宝兰.H.264中整数DCT变换及量化的DSP实现[J],2005,22(6):200-205.
    [64]卢望.基于Cell多核处理器的MPEG2视频解码技术的研究:[硕士学位论文].浙江:浙江大学电气工程学院,2008.
    [65]姜堃.数学形态学算法研究:[硕士学位论文].重庆:重庆大学,2007.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700