基于动态指令集的自适应处理器的关键技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
专用指令集处理器(ASIP)既继承了通用处理器(GPP)的编程灵活、上市时间短等优点,又集成了专用集成电路(ASIC)的功耗低、执行高效等特性,它的出现给系统开发、设计等不同层次的人员带来了许多益处,因此越来越受到学术界和工业界的欢迎和关注。但是,ASIP的设计实现难度较大,其中难度最大的是快速的工具链实现和验证。如何降低或避免ASIP工具链开发所带来的额外开销,是ASIP应用中一个重要的问题。
     新技术领域的兴起对程序与计算提出了更严格的要求,频繁变化的用户需求使处理器执行的任务具有高度的动态性。传统的基于静态指令集设计的处理器已经无法够满足这些应用的需求。如何设计新的架构来满足用户动态变化的需求,是处理器设计中的重要问题。
     本文针对上述两个问题,提出了一种基于自适应ASIP处理器(ApplicationSpecific Adaptive Processor,ASAP)的解决方案。ASAP将ASIP技术与可重构技术结合到一起,让处理器能够动态的扩展自定义指令以适应变化的应用需求;同时保证底层硬件的重构对上层软件透明,使自定义指令能够动态的映射而不改变对应用程序的接口,以重用原有工具链,从而减少开发者负担并缩短研发周期。
     本文开展的主要研究工作和创新特色如下:
     (1)本文首先分析研究了目前ASIP的典型开发流程,指出了其中的关键问题,然后针对其中的问题来寻找相应的解决方案。首先,针对工具链中验证难的问题,本文结合目前常见的基于体系结构描述语言(ADL)工具的设计流程,提出了一种基于ADL的指令集规范验证方法,协同验证了指令集规范,处理器模型及工具链。然后,针对工具链开发负担大和用户需求动态变化大的问题,本文提出了一个基于动态指令集的自适应处理器架构ASAP的方案,该方案能避免工具链的问题并适应用户的动态需求。
     (2)本文详细分析研究了应用的特征和已有的剖析技术,结合ASAP处理器架构,设计并实现了一个可配置的硬件剖析器CHP,使之能够与微处理器低耦合的工作,并在占用较少硬件资源的情况下,正确的找出目标应用的热点路径。本文通过详尽的实验确定了剖析器各个部件中关键参数的设置。实验表明,对于适合优化的应用,CHP找出的热点路径的覆盖率都能够达到80%以上,为指令集优化工作奠定了很好的基础。
     (3)本文详细分析了常见的指令集扩展技术,对其中的自定义指令生成和自定义指令选择问题进行了深入的研究。首先结合ASAP处理器架构给出了一个自定义指令生成算法,通过数据流分析、指令簇标记、子图枚举、子图合并的方法,找出了符合自定义扩展指令多约束要求的候选指令集合。实验数据表明,该算法能够高效的找出目标应用的所有非平凡自定义指令集合。然后,针对目前自定义指令选择问题中,常见的启发式算法无法找到最优解的情况,给出了一种贪心的启发式算法GreedyHeur和一种结合贪心策略和差分进化思想的ISDE算法。实验表明,GreedyHeur算法能快速的选择比原有启发式算法更优的候选指令集合,而ISDE算法在指令数目约束较强时能在较低的时间复杂度下选出性能提升值远远超过其他启发式算法的候选指令组合。
     本文还分析研究了常见的可重构阵列架构。结合ASAP处理器架构,描述了一种实用的可重构阵列架构的设计与实现,然后针对这种架构,给出了一种利用硬件表格来分析指令间寄存器的生产者-消费者关系,从而实现自定义指令自动映射的方法。
Application Specific Adaptive Processors (ASIPs) combine the flexibility and competitive time-to-market of embedded processors with the computational performance and energy-efficiency of dedicated VLSI hardware implementations. As they bring several advantages to different kinds of developers, they are becoming more and more popular. But the cost of developing ASIPs is large, especially for the time consuming of provision and verification of the tool chains. The problem of eliminating tool chain related cost poses a significant challenge.
     With the emergence of new technologies and varying user requirements, the requirements to processors become more and more challenging. ASIPs that designed with static instruction set are hard to meet all those requirements. So the research of brand new processor architecture is becoming more and more important.
     This dissertation presents the framework of an adaptive ASIP based on dynamic instruction-set to solve the problems. The adaptive ASIP mean to avoid the tool chain problems in ASIP development and to meet the varying user requirements. It intergrates reconfigurable technology with ASIP to support dynamic instruction set extension; it also keep the application binary interface unchanged during dynamic reconfiguration of hardware, in order to reuse the toolchain.
     This dissertation's key researches and contributions focus on follow aspects:
     (1) ASIP toolchain problem and its soultion: we analysis the typical ASIP design flow, point out the exsiting problems, and propose two solutions. First, we introduce an ADL-based verification methodology for co-verification of tools, instruction set specification and CPU model. Second, we describe the framework of Application Specific Adaptive Processor (ASAP) to avoid the toolchain problem and meet the varying user requirements.
     (2) Application characteristics and configurable profiler design: we first analysis the characteristics of application benchmarks, then design a configurable hardware profiler(CHP). CHP could work loosely with embedded processors, spot hot paths efficiently without too much hardware resources. We determine the crucial parameters with extensive experiments.Empirial experiments on path profiling show that the coverage of hot paths found by CHP are usually above 80%, providing great opptunities for instruction set optimization.
     (3) Instruction set extension and candidates selection algorithm: we first propose a custom instruction set extension algorithm for ASAP, including data flow analysis, instruction clustering, sub-graph enumerating and sub-graph merging. Experiments show that the algorithm could enmuerate all the non-trival candicates efficiently. Then we analysis the existing candidate selection algorithm and propose two new algorithms: as heuristic algorithms usually omit the difference between instruction and instruction instance, we improved one existing heuristic algorithm to GreedyHeur algorithm. It calculates custom instructions' weights from their instruction instances, then select custom instruction instances with greedy strategy according to their instructions' weights. To find better custom instruction than heuristic algorithms, we introduced an algorithm (ISDE) integrating greedy strategy with differential evolution algorithm. Simple encoding and efficient fitness evaluation help ISDE find the best combination of custom instructions quickly. Experiments show that our algorithms can find better custom instruction candidates more quickly and efficiently than heuristic algorithm.
     This dissertation also discusses about reconfigurable array architecture and dynamic techniques, and decribes a practical reconfigurable array architecture and its dynamic mapping mechanism based on hardware table.
引文
[1]Greg Papadopoulos.Redshift:The explosion of massive-scale systems,2007.sas2007.
    [2]Andrew B.Kahng.Design technology productivity in the dsm era(invited talk).In ASP-DAC '01:Proceedings of the 2001 conference on Asia South Pacific design automation,pages 443 448,New York,NY,USA,2001.ACM.ISBN 0-7803-6634-4.
    [3]Paolo Bonzini,Dilek Harmanci,and Laura Pozzi.A study of energy saving in customizable processors.In Stamatis Vassiliadis,Mladen Berekovic,and Timo D.,editors,SAMOS,volume 4599of Lecture Notes in Computer Science,pages 304 312.Springer,2007.ISBN 978-3-540-73622-6.
    [4]N.Clark.An architecture framework for transparent instruction set customization in embedded processors.2005.
    [5]Zebo Peng.Application specific instruction processor architecture,2001.
    [6]Tilman Glokler and Heinrich Meyr.Design of Energy-Efficient Application-Specific Instruction Set Processors.Kluwer Academic Publishers,Norwell,MA,USA,2004.ISBN 1402077300.
    [7]Kiran Bondalapati and Viktor K.Prasanna.Reconfigurable computing systems.In Proceedings of the IEEE,volume 90,pages 1201 1217,2001.
    [8]Katherine Compton and Scott Hauck.Reconfigurable computing:A survey of systems and software.ACM Computing Survey,2000.
    [9]A.Dehon.The density advantage of configurable computing.Computer,33(4):41 49,2000.
    [10]R.Hartenstein.A decade of reconfigurable computing:a visionary retrospective,2001.367839 642-649.
    [11]T.J.Todman,G.A.Constantinides,S.J.E.Wilton,O.Mencer,W.Luk,and P.Y.K.Cheung.Reeonfigurable computing:architectures and design methods.Computers and Digital Techniques,IEE Proceedings-,152(2):193,2005.1350-2387.
    [12]S.Donthi and R.L,Haggard.A survey of dynamically reconfigurable fpga devices,page 422,2003.
    [13]Youping Chen Chen Xu Fangmin Li Renfa Li,Zude Zhou.Hardware for reconfigurable computing.JCRD,40(3):500 506,2003.
    [14]Wayne Luk,Peter.Y.K.Cheung,and Nabeel Shirazi.Configurable Computing.Academic Press,2004.
    [15]Sai Luo.Architectural Research and Implementation on Reconfigurable Computing System.phD thesis,2006.
    [16]Xilinx.Embedded processing.
    [17]Altera.Nios ii processor:The world's most versatile embedded processor.
    [18]Altera.Custom Instructions for the Nios Embedded Processor.Altera,2002.
    [19]Jason Cong,Yiping Fan,Guoling Han,and Zhiru Zhang.Application-specific instruction generation for configurable processor architectures.In FPGA '04:Proceedings of the 2004ACM/SIGDA 12th international symposium on Field programmable gate arrays,pages 183 189,New York,NY,USA,2004.ACM Press.ISBN 1-58113-829-6.
    [20]Barat Francisco,Lauwereins Rudy,and Deconinck Geert.Reconfigurable instruction set processors from a hardware/software perspective.IEEE Trans.Softw.Eng.,28(9):847 862,2002.631281.
    [21]Michael D.Smith Rahul Razdan,Bill Grundmann.Dynamically programmable reduced instruction set computer with programmable processor loading on program number field and program number register contents,1997.United States Patent 5696956.
    [22]M.J.Wirthlin.A dynamic instruction set computer.In FCCM '95:Proceedings of the IEEE Symposium on FPGA's for Custom Computing Machines,page 99,Washington,DC,USA,1995.IEEE Computer Society.ISBN 0-8186-7086-X.
    [23]C.Plessl,R.Enzler,H.Walder,J.Beutel,M.Platzner,L.Thiele,and G.oster.The case for reconfigurable hardware in wearable computing.2003.
    [24]IMEC.T-recs gecko:Hardware / software multitasking on a reconfigurable platform.http://www.imec.be,http://www.imec.be.
    [25]L.Thiele,S.Chakraborty,M.Gries,and S.Knzli.Design space exploration of network processor architectures.2002.
    [26]Shiwen Hu,Madhavi Valluri,and Lizy Kurian John.Effective management of multiple configurable units using dynamic optimization.ACM Trans.Archit.Code Optim.,3(4):477 501,2006.
    [27]Rahui Joshi,Michael D.Bond,and Craig Zilles.Targeted path profiling:Lower overhead path profiling for staged dynamic optimization systems.In CGO '04:Proceedings of the international symposium on Code generation and optimization,page 239,Washington,DC,USA,2004.IEEE Computer Society.ISBN 0-7695-2102-9.
    [28]Michael J.Flynn and Albert A.Liddicoat.Technology trends and adaptive computing.Lecture Notes in Computer Science,2147:1 ??,2001.
    [29]Paul Master.The age of adaptive computing is here.In FPL '02:Proceedings of the Reconfigurable Computing Is Going Mainstream,12th International Conference on Field-Programmable Logic and Applications,pages 1 3,London,UK,2002.Springer-Verlag. ISBN 3-540-44108-5.
    [30]David Blaauw David Bull Trevor Mudge Shidhartha Das,David Roberts.Architectural Techniques for Adaptive Computing.Springer Publishing Company,2008.
    [31]D.Smith and D.Bhatia.RACE:Reconfigurable and adaptive computing environment.In Reiner W.Hartenstein and Manfred Glesner,editors,Field-Programmable Logic:Smart Applications,New Paradigms and Compilers,pages 87 95.Springer-Verlag,Berlin,1996.
    [32]Gary E.Coxe,Robin L.Galica.Reconfigurable environmentally adaptive computing,2008.United States Patent 7343579.
    [33]Daniel J.Allred,Walter Huang,Venkatesh Krishnan,Heejong Yoo,and David V.Anderson.An fpga implementation for a high throughput adaptive filter using distributed arithmetic.In FCCM '04:Proceedings of the 12th Annual IEEE Symposium on Field-Programmable Custom Computing Machines,pages 324 325,Washington,DC,USA,2004.IEEE Computer Society.ISBN 0-7695-2230-0.
    [34]P.Master.The next big leap in reconfigurable systems.Field-Programmable Technology,2002.(FPT).Proceedings.2002 IEEE International Conference on,pages 17 22,2002.
    [35]Stretch.The s6000 family of processors,2007.
    [36]Tensilica.www.tensilica.com/.
    [37]Manoj Kumar Jain,M.Balakrishnan,and Anshul Kumar.Asip design methodologies:Survey and issues.In VLSID '01:Proceedings of the The 14th International Conference on VLSI Design(VLSID '01),page 76,Washington,DC,USA,2001.IEEE Computer Society.ISBN 0-7695-0831-6.
    [38]Dinesh C.Suresh,Walid A.Najjar,Frank Vahid,Jason R.Villarreal,and Greg Stitt.Profiling tools for hardware/software partitioning of embedded applications.In LCTES '03:Proceedings of the 2003 ACM SIGPLAN conference on Language,compiler,and tool for embedded systems,pages 189 198,New York,NY,USA,2003.ACM.ISBN 1-58113-647-1.
    [39]Carsten Gremzow.Compiled low-level virtual instruction set simulation and profiling for code partitioning and asip-synthesis in hardware/software co-design,ln SCSC:Proceedings of the 2007summer computer simulation conference,pages 741 748,San Diego,CA,USA,2007.Society for Computer Simulation International.ISBN 1-56555-316-0.
    [40]Angela Yun Zhu andLi Xi andLaurenee T.Yang andJun Yang.A fast instruction set evaluation method for asip designs.In Embedded and Ubiquitous Computing,2006.
    [41]Johnson Kin,Chunho Lee,William H.Mangione-Smith,and Miodrag Potkonjak.Power efficient mcdiaprocessors:Design space exploration.In Design Automation Conference,pages 321 326,1999.
    [42]Tilman Glokler,Andreas Hoffmann,and Heinrich Meyr.Methodical low-power asip design space exploration,volume 33,pages 229 246,Hingham,MA,USA,2003.Kluwer Academic Publishers.
    [43]Oliver Schliebusch,Andreas Hoffmann,Achim Nohl,Gunnar Braun,and Heinrich Meyr.Architecture implementation using the machine description language lisa.In ASP-D,4C '02:Proceedings of the 2002 conference on Asia South Pacific design automation/VLSl Design,page 239,Washington,DC,USA,2002.IEEE Computer Society.ISBN 0-7695-1441-3.
    [44]Andreas Hoffmann,Heinrich Meyr,and Rainer Leupers.Architecture Exploration for Embedded Processors with Lisa.Kluwer Academic Publishers,Norwell,MA,USA,2002.ISBN 1402073380.
    [45]Hanno Scharwaechter,David Kammler,Andreas Wieferink,Manuel Hohenauer,Kingshuk Karuri,Jianjiang Ceng,Rainer Leupers,Gerd Ascheid,and Heinrich Meyr.Asip architecture exploration for efficient ipsec encryption:A case study,volume 6,page 12,New York,NY,USA,2007.ACM.
    [46]Manoj Kumar Jain,M.Balakrishnan,and Anshul Kumar.An efficient technique for exploring register file size in asip synthesis.In CASES '02:Proceedings of the 2002 international conference on Compilers,architecture,and synthesis for embedded systems,pages 252 261,New York,NY,USA,2002.ACM.ISBN 1-58113-575-0.
    [47]Basant Kumar Dwivedi,Anshul Kumar,and M.Balakrishnan.Automatic synthesis of system on chip multiprocessor architectures for process networks.In CODES+ISSS '04:Proceedings of the 2nd IEEE/ACM/IFIP international conference on Hardware/software codesign and system synthesis,pages 60 65,New York,NY,USA,2004.ACM.ISBN 1-58113- 937-3.
    [48]Hai Lin and Yunsi Fei.Utilizing custom registers in application-specific instruction set processors for register spills elimination.In GLSVLSI '07:Proceedings of the 17th great lakes symposium on Great lakes symposium on VLSI,pages 323 328,New York,NY,USA,2007.ACM.ISBN 978-1-59593-605-9.
    [49]Jun Yang.Research on Application Specific Instruction Set Processor(ASIP)Architecture Design.PhD thesis,University of Science and Technology of China,2006.
    [50]Norman Ramsey and Mary F.Fernandez.Specifying representations of machine instructions.volume 19,pages 492 524.ACM Press,May 1997.
    [51]Norman Ramsey and Mary E Fernandez.The New Jersey Machine-Code Toolkit.1995.289-302 pp.
    [52]Mary Fernandez and Norman Ramsey.Automatic checking of instruction specifications.
    [53]Thomas Ball and James R.Larus.Efficient path profiling.In MICRO,pages 46 57,1996.
    [54]James R.Larus.Whole program paths,volume 34,pages 259 269,New York,NY,USA,1999.ACM.
    [55]Taweesup Apiwattanapong and Mary Jean Harrold.Selective path profiling.SIGSOFT Softw.Eng.Notes,28(1):35 42,2003.
    [56]Michael D.Bond and Kathryn S.McKinley.Practical path profiling for dynamic optimizers.pages 205 216,2005.
    [57]David Oren,Yossi Matias,and Shmuel Sagiv.Online subpath profiling.In CC '02:Proceedings of the 11th International Conference on Compiler Construction,pages 78 94,London,UK,2002.Springer-Verlag.ISBN 3-540-43369-4.
    [58]Toshiaki Yasue,Toshio Suganuma,Hideaki Komatsu,and Toshio Nakatani.An efficient online path profiling framework for java just-in-time compilers.In PACT '03:Proceedings of the 12th International Conference on Parallel Architectures and Compilation Techniques,page 148,Washington,DC,USA,2003.IEEE Computer Society.ISBN 0-7695-2021-9.
    [59]Jeffrey Dean,James E.Hicks,Carl A.Waldspurger,William E.Weihl,and George Z.Chrysos.Profileme:Hardware support for instruction-level profiling on out-of-order processors.In International Symposium on Microarchitecture,pages 292 302,1997.
    [60]Stephan Gatzka and Christian Hochberger.Hardware based online profiling in amidar processors.In IPDPS '05:Proceedings of the 19th IEEE International Parallel and Distributed Processing Symposium(IPDPS'05)-Workshop 3,page 144.2,Washington,DC,USA,2005.IEEE Computer Society.ISBN 0-7695-2312-9.
    [61]Kapil Vaswani,Matthew J.Thazhuthaveetil,and Y.N.Srikant.A programmable hardware path profiler.In CGO '05:Proceedings of the international symposium on Code generation and optimization,pages 217 228,Washington,DC,USA,2005.IEEE Computer Society.ISBN 0-7695-2298-X.
    [62]Alippi Cesare,Fornaciari William,Pozzi Laura,and Sami Mariagiovanna.A dag-based design approach for reconfigurable vliw processors,1999.307504 57.
    [63]Farhad Mehdipour,Hamid Noori,Morteza Saheb Zamani,Kazuaki Murakami,Koji Inoue,and Mehdi Sedighi.Custom instruction generation using temporal partitioning techniques for a reconfigurable functional unit.In EUC,pages 722 731,2006.
    [64]Ryan Kastner,Seda Ogrenci-Memik,Elaheh Bozorgzadeh,and Majid Sarrafzadeh.Instruction generation for hybrid reconfigurable systems.In ICCAD,pages 127,2001.
    [65]Partha Biswas,Sudarshan Banerjee,Nikii D.Dutt,Pozzi Laura,and lenne Paolo.lsegen:An iterative improvement-based ise generation technique for fast customization of processors.IEEE Transactions On Very Large Scale Integration Systems,14:754 762,2006.
    [66]K.Atasu,L.Pozzi,and P Ienne.Automatic application specific instruction-set extensions under microarchitectural constraints,2003.
    [67]Yu Pan and Mitra Tulika.Scalable custom instructions identification for instruction-set extensible processors,2004.1023844 69-78.
    [68]Fei Sun,Srivaths Ravi,Anand Raghunathan,and Niraj K.Jha.Synthesis of custom processors based on extensible platforms.In ICCAD '02:Proceedings of the 2002 IEEE/ACM international conference on Computer-aided design,pages 641 645,New York,NY,USA,2002.ACM Press.ISBN 0-7503-7607-2.
    [69]Partha Biswas,Vinay Choudhary,Kubilay Atasu,Laura.Pozzi,Paolo Ienne,and Nikil Dutt.Introduction of local memory elements in instruction set extensions.In DAC '04:Proceedings of the 41st annual conference on Design automation,pages 729 734,New York,NY,LISA,2004.ACM Press.ISBN 1-55113-828-8.
    [70]Nathan T.Clark and Hongtao Zhong.Automated custom instruction generation for domain-specific processor acceleration.IEEE Trans.Comput.,54(10):1258 1270,2005.Member-Scott A.Mahlke.
    [71]B.Blodget,P.James-goxby,E.Kelle,S.McMillan,and P.Sundararajan.A selfreconfiguring platform.2003.
    [72]Hamid Noori,Farhad Mehdipour,Kazuaki Murakami,Koji Inoue,and Maziar Goudarzi.Interactive presentation:Generating and executing multi-exit custom instructions for an adaptive extensible processor.In Rudy Lauwereins and Jan Madsen,editors,DATE,pages 325 330.ACM,2007.ISBN 978-3-9810501-2-4.
    [73]Lars Bauer,Muhammad Shafique,Dirk Teufel,and Jorg Henkel.A self-adaptive extensible embedded processor.In SASO,pages 344 350,2007.
    [74]Carsten Albrecht,Juergen Foag,Roman Koch,and Erik Marble.Dynacore - a dynamically reconfigurable coprocessor architecture for network processors.In PDP,pages 101 108.IEEE Computer Society,2006.ISBN 0-7695-2513-X.
    [75]Roman Koch,Thiio Pionteck,Carsten Albrecht,and Erik Marble.An adaptive system-on-chip for network applications.2006.
    [76]Norifumi Yoshimatsu,Makoto Yoshida,Takeshi Soga,Makoto Shuto,Yasuynki Tanoue,Yosuke Fujii,Kazuhito Eshima,Takanori Hayashida,and Kazuaki Murakami.Systemorph:Dynamic/online/adaptive system-level optimization for soc.pages 442 447,2004.
    [77]V.Manohararajah,S.D.Brown,and Z.G.Vranesic.Adaptive fpgas:High-level architecture and a synthesis method.Field Programmable Logic and Applications,2006.FPL '06.International Conference on,pages 1 8,2006.
    [78] Lars Bauer, Muhammad Shafique, Simon Kramer, and Jorg Henkel. Rispp: rotating instruction set processing platform. In DAC '07: Proceedings of the 44th annual conference on Design automation, pages 791 796, New York, NY, USA, 2007. ACM. ISBN 978-1 -59593-627-1.
    
    [79] Stamatis Vassiliadis, Georgi Kuzmanov, Stephan Wong, Elena Moscu-Panainte, Georgi Gaydadjiev, Koen Bertels, and Dmitry Cheresiz. Pisc: Polymorphic instruction set computers. In Reconfigurable Computing: Architectures and Applications, pages 274 286, 2006. 10.1007/11802839J6.
    
    [80] Stamatis Vassiliadis, Stephan Wong, Georgi Gaydadjiev, Koen Bertels, Georgi Kuzmanov, and Elena Moscu Panainte. The molen polymorphic processor, volume 53, pages 1363 1375, Washington, DC, USA, 2004. IEEE Computer Society.
    
    [81] Yousuke Fujii Kazuhito Eshima Makoto Yoshida Takeshi Soga Takanori Hayashida Hamid Noori, Yoshimatsu Norifumi and Kazuaki Murakami. An online profiling-based dynamically adaptable processor. In The Proceedings of the 11th International CSI Conferene, 2005.
    
    [82] Sami Yehia, Nathan Clark, Scott Mahlke, and Krisztian Flautner. Exploring the design space of lut-based transparent accelerators, pages 11 21,2005.
    
    [83] Marc L. Corliss, E. Christopher Lewis, and Amir Roth. Dise: a programmable macro engine for customizing applications, volume 31, pages 362 373, New York, NY, USA, 2003. ACM.
    
    [84] Wikipedia. Optimization (computer science), wikipedia.
    
    [85] Spixtools: Introduction and users manual. Technical Report TR93-6, Sun Microsystems, Feb, 1993.
    
    [86] Intel. Vtune.
    
    [87] Kingshuk Karuri, Mohammad Abdullah Al Faruque, Stefan Kraemer, Rainer Leupers, Gerd Ascheid, and Heinrich Meyr. Fine-grained application source code profiling for asip design. In DAC '05: Proceedings of the 42nd annual conference on Design automation, pages 329 334, New York, "NY, USA, 2005. ACM. ISBN 1-59593-058-2.
    
    [88] Sriraman Tai lam, Rajiv Gupta, and Xiangyu Zhang. Extended whole program paths. In PACT '05: Proceedings of the Nth International Conference on Parallel Architectures and Compilation Techniques, pages 17 26, Washington, DC, USA, 2005. IEEE Computer Society. ISBN 0-7695-2429-X.
    
    [89] Jennifer M. Anderson, Lance M. Berc, Jeffrey Dean, Sanjay Ghemawat, Monika R. Henzinger, Shun-Tak A. Leung, Richard L. Sites, Mark T. Vandevoorde, Carl A. Waldspurger, and William E. Weihl. Continuous profiling: where have all the cycles gone? volume 15, pages 357 390, New York, NY, USA, 1997. ACM.
    [90]Thomas M.Conte,Burzin A.Patel,Kishore N.Menezes,and J.Stan Cox.Hardware-based profiling:an effective technique for profile-driven optimization,volume 24,pages 187 206,Norwell,MA,USA,1996.Kluwer Academic Publishers.
    [91]Satish Narayanasamy,Tim othy Sherwood,Suleyman Salt,Brad Calder,and George Varghese.Catching accurate profiles in hardware.In HPCA '03:Proceedings of the 9th International Symposium on High-Performance Computer Architecture,page 269,Washington,DC,USA,2003.IEEE Computer Society.ISBN 0-7695-1871-0.
    [92]Matthew C.Merten,Andrew R.Trick,Christopher N.George,John C.Gyllenhaal,and Wen mei W.Hwu.A hardware-driven profiling scheme for identifying program hot spots to support runtime optimization.In ISCA '99:Proceedings of the 26th annual international symposium on Computer architecture,pages 136 147,Washington,DC,USA,1999.IEEE Computer Society.ISBN 0-7695-0170-2.
    [93]Todd Austin.Simplescalar/arm.
    [94]Gokhan Memik,William H.Mangione-Smith,and Wendong Hu.Netbench:A benchmarking suite for network processors.In ICCAD,pages 39,2001.
    [95]S.Subramanya Sastry,Rastislav Bodk,and James E.Smith.Rapid profiling via stratified sampling.volume 29,pages 278 289,New York,NY,USA,2001.ACM.
    [96]DONG Xi-Qian and ZHANG Zhao-Qing.The design and implementation of edge profiling in compiler.Computer Science,1:46 48,2003.
    [97]LIANG Shanshan,ZHANG Junchao,and FENG Xiaobing.Edge profiling technology in godson/orc compiler.Computer Engineering,13:78 80,2007.
    [98]Tang Yu Xing.Dynamic Optimization and Microprocessor Architecture Support for Dynamic Binary Translation.PhD thesis,National University of Defence Technology,2005.
    [99]Bruce Kester Hoimer.Automatic Design of Computer Instruction Sets.PhD thesis,U.of California,Berkeley,1993.
    [100]E.Gonzalez R.icardo.Xtensa:A configurable and extensible processor.In IEEE Micro,volume 20,pages 60 70.IEEE Computer Society Press,2000.624348.
    [101]R.Leupers,K.Karuri,S.Kraemer,and M.Pandey.A design flow for configurable embedded processors based on optimized instruction set extension synthesis.In DATE '06:Proceedings of the conference on Design,automation and test in Europe,pages 581 586,3001Leuven,Belgium,Belgium,2006.European Design and Automation Association.ISBN 3-9810801-0-6.
    [102]Yu Pan and T.Mitra.Characterizing embedded applications for instruction-set extensible processors,page 723,2004.
    [103]Yu Pan and Mitra Tulika.Efficient custom instruction identification with exact enumeration.Technical report,NUS,2007.
    [104]Georg Sander.VCG - visualization of compiler graphs.Technical Report FebS-5,8,1996.73 pp.
    [105]M.Guthaus,J.Ringenberg,D.Ernst,T.Austin,T.Mudge,and T.Brown.Mibench:A free,commercially representative embedded benchmark suite,pages 3 14,December 2001.
    [106]R.Leupers and P.Marwedel.Instruction selection for embedded dsps with complex instructions.In EURO-DAC '96/EURO-VHDL '96." Proceedings of the conference on European design automation,pages 200 205,Los Alamitos,CA,USA,1996.IEEE Computer Society Press.ISBN 0-8186-7573-X.
    [107]Jeremy Peter Bennett.A Methodolog),for Automated Design of Computer Instruction Sets.PhD thesis,University of Cambridge,1987.
    [108]Partha Biswas,Sudarshan Banerjee,Nikil Dutt,Laura Pozzi,and Paolo Ienne.Isegen:Generation of high-quality instruction set extensions by iterative improvement.In DATE '05:Proceedings of the conference on Design,Automation and Test in Europe,pages 1246 1251,Washington,DC,USA,2005.IEEE Computer Society.ISBN 0-7695-2288-2.
    [109]Xie Haiyong,Zhao Li,and L.Bhuyan.Architectural analysis and instruction-set optimization for design of network protocol processors,page 225,2003.
    [110]CHEN Hong-song,JI Zhen-zhou,HU Ming-zeng,and JI Yi.High frequency instruction pair combination and analysis in network processor design.Journal of Chinese Computer Systems,27:339 343,2006.
    [111]P.Bose and E.S.Davidson.Design of instruction set architctures for support of high-level languages.In ISCA,1984.
    [112]L.lenne P.De Micheli G.Peymandoust,A.Pozzi.Automatic instruction set extension and utilization for embedded processors.In ASAP,2003.
    [113]S.Steinke,L.Wehmeyer,B.Lee,and P.Marwedel.Assigning program and data objects to scratchpad for energy reduction.In DATE '02:Proceedings of the conference on Design,automation and test in Europe,page 409,Washington,DC,USA,2002.IEEE Computer Society.
    [114]Marnix Arnold and Henk Corporaal.Designing domain-specific processors.In CODES '01:Proceedings of the ninth international symposium on Hardware/software codesign,pages 61 66,New York,NY,USA,2001.ACM Press.ISBN 1-58113-364-2.
    [115]Cheung Newton,Henkel Jorg,and Parameswaran Sri.Rapid configuration and instruction selection for an asip:A case study.In Proceedings of the conference on Design,Automation and Test in Europe-Volume 1.IEEE Computer Society,2003.1022823 10802.
    [116]Lee Jong-eun,Choi Kiyoung,and Dutt Nikil.Efficient instruction encoding for automatic instruction set design of configurable asips.In ICCAD,San Jose,California,2002.ACM.774668649-654.
    [117]Nathan Clark,Hongtao Zhong,and Scott Mahlke.Processor acceleration through automated instruction set customization.In MICRO 36:Proceedings of the 36th annual IEEE/ACM International Symposium on Microarchitecture,page 129,Washington,DC,USA,2003.IEEE Computer Society.ISBN 0-7695-2043-X.
    [118]Storn Rainer and Price Kenneth.Differential evolution - a simple and efficient heuristic for global optimization over continuous spaces.J.of Global Optimization,11(4):341 359,1997.596146.
    [119]Francesco Spadini,Michael Fertig,and Sanjay J.Patel.Characterization of repeating dynamic code fragments.Technical Report CHRC-02-09,UIUC,2002.
    [120]Xilinx.Virtex-Ⅱ Pro / Virtex-Ⅱ Pro X Complete Data Sheet,2007.
    [121]Altera.Cyclone Ⅱ Device Handbook,volume 1 edition,2008.
    [122]Scott Hauck,Thomas W.Fry,Matthew M,Hosler,and Jeffery P.Kao.The Chimaera reconfigurable functional unit.In Kenneth L.Pocek and Jeffery Arnold,editors,Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines,pages 87 96.IEEE Computer Society Press,1997.
    [123]E.Tau,I.Eslick,D.Chen,J.Brown,and A.DeHon.A first generation DPGA implementation.In Proceedings of the Third Canadian Workshop on Field-Programmable Devices,pages 138 143,1995.
    [124]Timothy J.Callahan,John R.Hauser,and John Wawrzynek.The garp architecture and c compiler,volume 33,pages 62 69,Los Alamitos,CA,USA,2000.IEEE Computer Society Press.
    [125]R.Wittig and P.Chow.OneChip:An FPGA processor with reconfigurable logic.In Kenneth L.Pocek and Jeffrey Arnold,editors,IEEE Symposiu,on FPGAs for Custom Computing Machines,pages 126 135,Los Alamitos,CA,1996.IEEE Computer Society Press.
    [126]R.Razdan and M.D.Smith.A high-performance microarchitecture with hardware-programmable functional units.In Proceedings of the 27th Annual International Symposium on Microarchitecture,pages 172 80,1994.
    [127]Simon D.Haynes and Peter Y.K.Cheung.A reconfigurable multiplier array for video image processing tasks,suitable for embedding in an fpga structure.In FCCM '98:Proceedings of the IEEE Symposium on FPGAs for Custom Computing Machines,page 226,Washington,DC, USA,1998.IEEE Computer Society.ISBN 0-8186-8900-5.
    [128]Alan Marshall,Tony Stansfield,Igor Kostarnov,Jean Vuillemin,and Brad Hutchings.A reconfigurable arithmetic array for multimedia applications.In FPGA '99:Proceedings of the 1999ACM/SIGDA seventh international symposium on Field programmable gate arrays,pages 135 143,New York,NY,USA,1999.ACM.ISBN 1-58113-088-0.
    [129]Reiner Hartenstein,Coarse grain reconfigurable architecture(embedded tutorial).In ASP-DAC '01:Proceedings of the 2001 conference on Asia South Pacific design automation,pages 564 570,New York,NY,USA,2001.ACM.ISBN 0-7803-6634-4.
    [130]Marco Lanuzza,Stefania Perri,Pasquale Corsonello,and Martin Margala.A new reconfigurable coarse-grain architecture for multimedia applications,volume 0,pages 119 126,Los Alamitos,CA,USA,2007.IEEE Computer Society.ISBN 0-7695-2866-X.
    [131]Bingfeng Mei,Serge Vernalde,Diederik Verkest,Hugo De Man,and Rudy Lauwereins.Adres:An architecture with tightly coupled vliw processor and coarse-grained reconfigurable matrix.In FPL,pages 61 70,2003.
    [132]R.A.Bittner,Jr.,P.M.Athanas,and M.D.Musgrove.Colt:an experiment in wormhole runtime reconfiguration.In John Schewel,editor,High-Speed Computing,Digital Signal Processing,and Filtering Using reconfigurable Logic,Proc.SPIE 2914,pages 187 195,Bellingham,WA,1996.SPIE The International Society for Optical Engineering.
    [133]E.Mirsky and A.DeHon.MATRIX:A reconfigurable computing architecture with configurable instruction distribution and deployable resources.In Kenneth L.Pocek and Jeffrey Arnold,editors,IEEE Symposium on FPGAs for Custom Computing Machines,pages 157 166,Los Alamitos,CA,1996.IEEE Computer Society Press.
    [134]Hartej Singh,Ming-Hau Lee,Guangming Lu,Nader Bagherzadeh,Fadi J.Kurdahi,and Eliseu M.Chaves Filho.Morphosys:An integrated reconfigurable system for data-parallel and computation-intensive applications,volume 49,pages 465 481,Washington,DC,USA,2000.IEEE Computer Society.
    [135]Seth Copen Goldstein,Herman Schmit,Matthew Moe,Mihai Budiu,Srihari Cadambi,R.Reed Taylor,and Ronald Laufer.Piperench:a co/processor for streaming multimedia acceleration.In ISCA '99:Proceedings of the 26th annual international symposium on Computer architecture,pages 28 39,Washington,DC,USA,1999.IEEE Computer Society.ISBN 0-7695-0170-2.
    [136]Carl Ebeling,Darren C.Cronquist,and Paul Franklin.Rapid-reconfigurable pipelined datapath.In FPL '96:Proceedings of the 6th International Workshop on Field-Programmable Logic,Smart Applications,New Paradigms and Compilers,pages 126 135,London,UK,1996. Springer-Verlag.ISBN 3-540-61730-2.
    [137]Elliot Waingold,Michael Taylor,Devabhaktuni Srikrishna,Vivel Sarkar,Walter Lee,Victor Lee,Jang Kim,Matthew Frank,Peter Finch,Rajeev Barua,Jonathan Babb,Saman Amarasinghe,and Anant Agarwal.Baring it all to software:Raw machines.Computer,30(9):86 93,1997.
    [138]J M Rabaey.Reconfigurable computing:The solution to low power programmable dsp.In in Proceedings 1997 ICASSP Conference,1997.
    [139]Xilinx.LogiBLOX:Product Specification.San Jose,CA,1997.
    [140]Xilinx.Virtex 2.5 V Field Programmable Gate Arrays:Advanced Product Specification.San Jose,CA,2001.
    [141]Vasanth Bala,Evelyn Duesterwald,and Sanjeev Banerjia.Dynamo:a transparent dynamic optimization system.ACM SIGPLAN Notices,35(5):1 12,2000.
    [142]Kemal Ebcioglu and Erik R.Altman.DAISY:Dynamic compilation for 100%architectural compatibility.In ISCA,pages 26 37,1997.
    [143]J.Dehnert,B.Grant,J.Banning,R.Johnson,T.Kistler,A.Klaiber,and J.Mattson.The transmeta code morphing software:Using speculation,recovery,and adaptive retranslation to address real-life challenges,2003.
    [144]Nathan Clark,Manjunath Kudlur,Hyunchul Park,Scott Mahlke,and Krisztian Flautner.Application-specific processing on a general-purpose core via transparent instruction set customization.In MICRO 37:Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture,pages 30 40,Washington,DC,USA,2004.IEEE Computer Society.ISBN 0-7695-2126-6.
    [145]P.Sassone and D.Wills.Dynamic strands:Collapsing speculative dependence chains for reducing pipeline communication.2004.
    [146]Sami Yehia and Olivier Temam.From sequences of dependent instructions to functions:An approach for improving performance without lip or speculation,volume 32,page 238,New York,NY,USA,2004.ACM.
    [147]Daniel Holmes Friendly,Sanjay Jeram Patel,and Yale N.Patt.Putting the fill unit to work:dynamic optimizations for trace cache microprocessors.In MICRO 31:Proceedings of the 31st annual ACM/IEEE international symposium on Microarchitecture,pages 173 181,Los Alamitos,CA,USA,1998.IEEE Computer Society Press.ISBN 1-58113-016-3.
    [148]Q.Jacobson and J.E.Smith.Instruction pre-processing in trace processors.In HPCA '99:Proceedings of the 5th International Symposium on High Performance Computer Architecture,page 125,Washington,DC,USA,1999.IEEE Computer Society.ISBN 0-7695-0004-8.
    [149]Yiannakis Sazeides,Stamatis Vassiliadis,and James E.Smith.The performance potential of data dependence speculation & collapsing.In MICRO 29:Proceedings of the 29th annual ACM/IEEE international symposium on Microarchitecture,pages 238 247,Washington,DC,USA,1996.IEEE Computer Society.ISBN 0-8186-7641-8.
    [150]J.Phillips and S.Vassiliadis.High-performance 3-1 interlock collapsing alu's.IEEE Trans.Comput.,43(3):257 268,1994.
    [151]Anne Bracy,Prashant Prahlad,and Amir Roth.Dataflow mini-graphs:Amplifying superscalar capacity and bandwidth.In MICRO 37:Proceedings of the 37th annual IEEE/ACM International Symposium on Microarchitecture,pages 18 29,Washington,DC,USA,2004.IEEE Computer Society.ISBN 0-7695-2126-6.
    [152]Soubhik Bhattacharya.Generation of gcc backend from sim-nml processor description.
    [153]Christoph Keand Andrzej Bednarski.A dynamic programming approach to optimal integrated code generation,pages 165 174,2001.
    [154]C.Cifuentes,B.Lewis,and D.Ung.Walkabout-a retargetable dynamic binary translation framework.2002.
    [155]C.Cifuentes and M.Van Emmerik.Uqbt:adaptable binary translation at low cost.Computer,33(3):60 66,2000.
    [156]Lal George.Mlrisc:Customizable and reusable code generators.Technical report,HP,1997.
    [157]H.Emmelmann,F.-W.Schroer,and L.Landwehr.Beg:a generation for efficient back ends,SIGPLAN Not.,24(7):227 237,1989.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700