DSP编译器关键技术研究
详细信息    本馆镜像全文|  推荐本文 |  |   获取CNKI官网全文
摘要
为了提高特定应用环境下的运行速度,DSP增加了许多特殊的指令和功能单元,体系结构越来越不规则。传统的代码生成算法是一种分治算法,没有考虑指令和寄存器之间的约束关系,难以应用在DSP编译器中。必须为DSP编译器发展出新的代码生成算法,以适应新的需求和挑战。本文主要研究了DSP编译器的若干关键技术,DSP编译器的目标机器平台是浙江大学自主研发的媒体DSP——SPOCK。
     编译器前端包括词法分析器、语法分析器和中间代码生成器等。针对DSP的体系结构特点,重新改造了LCC编译器中记号、符号表、数据类型支持等数据结构,并使LCC前端能够正确的处理计算溢出和数据类型转换等。
     DSP的体系结构复杂多变,拥有高级语言难以表述的诸多特性,代码生成技术是编译器最难、最复杂的技术之一,它从根本上决定了一个编译器的效率和性能。代码生成技术共分三个子问题:指令选择、寄存器分配以及指令调度,传统的分步优化算法生成的目标代码往往是次优代码。本文研究了包括时间约束、资源约束和DSP体系约束等特性,指出必须发展出同时考虑指令选择、寄存器分配和指令调度的新算法,才有可能为DSP生成优化的目标代码。
     根据最优化原理,提出调度DAG的概念,给出了同时考虑指令选择和寄存器分配的代码综合生成算法。该代码生成算法在指令生成过程中,充分考虑的指令和寄存器之间的约束,将代码优化生成的问题转化为在调度DAG中寻找一条优化路径的问题。
     在传统的图染色算法中,仅仅用物理寄存器个数——n表示寄存器堆的模型,这种简单的方式不足以描述DSP寄存器的约束关系。提出了一个能够描述这些约束关系的DSP寄存器模型,该模型将DSP的寄存器分成若干类,并定义了它们之间的相互约束量和优先级别。传统图染色算法只能有效地为通用处理器分配寄存器,改进了图染色算法,将算法的应用范围延伸到DSP领域。提出了有向冲突图的概念,有向边的权值代表寄存器类之间的约束值,寄存器分配的过程就是对有向图进行简化、归约的过程。给出判断有向冲突图节点染色性的精确判据,当冲突图中所有节点都不可染色,算法就选择一个优化的节点溢出到存储器中。
     中间语言反映源语言的结构,又和目标体系相关,具有目标语言的特性。中间语言对编译器的结构和功能影响很大,其形式是多方面考虑的折中。本文基于XML语言,扩展了IBURG的树文法。这种中间语言的好处就是简单明确,描述能力强,能方便地描述DSP体系结构的特征。为DSP编译器的重定位提供了一个良好的机制,简化了DSP编译器充定位的难度。
     LCC是一个重定位的通用处理器编译器系统。采用LCC的前端,应用综合代码生成算法和改进图染色寄存器分配算法,重写了编译器后端,构成了一个DSP编译器。试验证明,该DSP编译器能较好的利用DSP特性,提高了代码生成质量,降低了寄存器溢出的概率。
This dissertation is involved with key compile techniques and implementation of DSP (digital signal processor). The machine platform is a DSP named SPOCK, which is developed by the VLSI, Zhejiang University. To accelerate some applications, many special instructions and functional units are added to DSP. So ISAs of DSPs are more complicated, structures of DSPs more irregular. The traditional code generation algorithms ignore the relation of code selection and register allocation, which results in sub-optimal code generation. It is essential to develop new code generation algorithms for DSP. In this dissertation, the research fields include code selection, register allocation, code schedule of DSP code generation, and the implementation of DSP compiler, and etc.
     Compiler, assembler, linker and simulator constitute a whole compiler system. Compiler techniques are focused on in this dissertation. The compiler front-end includes lexical analysis, syntax analysis and semantic analysis. The symbol data structure, symbol table and the support structure to data type are designed newly to adapt to the architecture of DSP. The intermediate language has not only characteristics of high level language but also characteristics of assembly language. It affects the performance and function of compiler strongly, and it is designed as a tradeoff of many capabilities and requirements. In this DSP compiler, an intermediate language owning 36 operators is adopted, which can clearly and briefly describe abstract syntax tree.
     The architecture of DSP is more irregular than general purpose processor (GPP) and owns many special instructions. The time constraint, resource constraint and architecture constraint are researched in the dissertation, which makes code generation for DSP more difficult. Code selection, register allocation and instruction schedule are sub-phases of code generation, which are coupled with each other. These coupling relations result in traditional methods producing sub-optimality of generated code. To satisfy register restriction of DSP and real-time requirement of applications, a new code-generation algorithm based on dynamic programming is presented, which copes with code selection and register allocation simultaneously. A new concept—schedule directed acyclic graph (DAG) is introduced. The optimal code generation is translated into finding an optimal path in schedule DAG.
     In traditional graph-coloring algorithms, processor register file is described by the number of registers—N. But this simple model can not accurately describe DSP register file. A new model, which divides DSP register file into several register classes and precedence levels, defines the value of constraint, is presented. Traditional graph-coloring register allocation cannot produce optimal code for irregular structure DSP. The traditional graph-coloring algorithm is improved to be adapted to DSP according to DSP register file model. In the new algorithm, a directed interference graph is built by analyzing variables' live range and register constraints. The register allocation is translated into how to simplify this graph.
     The intermediate language, which affects functions of compiler importantly, has the characters of both high level language and the target language. The format of intermediate language of compiler is a tradeoff of many factors. In the dissertation, the grammar of IBURG is extended to describe the irregular architecture of DSP.This grammar based on XML affords a good retargetable mechanism for DSP compiler.
     LCC is a retargetable compiler. Using the front end of the LCC compiler, applying the methods dealing with code selection and register allocation simultaneously, a new backend is developed. Experimental results show it has better performance of code-generation and less register spilling than traditional code-generation algorithm.
引文
[1] K. Keutzer and S. Malik and A. R. Newton. From ASIC to ASIP: the next design discontinuity [C]// Proceedings of the 2002 IEEE International Conference on Computer Design: VLSI in Computers and Processors (ICCD'02). Washington, DC, USA: 2002: 84.
    [2] Keshab K. Parhi. VLSI digital signal processing systems: design and implementation [M]// John Wiley & Sons, Inc, 1999: 509-522.
    [3] Andrew Bateman, Iain Paterson-Stephens. The DSP handbook: algorithms, applications and design techniques [M]// Pearson Education Limited, 2002: 16-77.
    [4] .R. Leupers and P. Marwedel, Retargetable Compiler technology for embedded systems—tools and applications [R]// Kluwer Academic, Boston, 2001.
    [5] A.Kumar M.K.Jain. M.Balakrishnan. ASIP design methodologies: survey and issues [C]// Proceedings of the 14th International Conference on VLSI Design (VLSID'01),2001:76.
    [6] L.K.John D.Talla, V. Lapinskii, B. L. Evans. Evaluating signal processing and multimedia applications on SIMD, VLIW and superscalar architectures [C]// Proceeding of 2000 IEEE International Conference on Computer Design (ICCD'00), 2000: 163.
    [7] Kenneth C. Louden. Compiler construction priciples and practice [M]// PWS publishing company, 1997:1-20
    [8] AHO A V, SETHI R, ULLMAN J D, et al. Compiler: Principles, Techniques and Tools [M]// Boston: Addison-Wesley Longman Publishing Co., Inc, 1986. [9] Araujo G, Malikaraujo S. Code generation for fixed-point DSPs [C]// ACM Transactions on Design Automation of Electronic Systems. New York: ACM, 1998: 136-161
    [10] Backus, J. W, R. J. Beeber, S. Best, et al. The Fortran automatic coding system [C]//Western Joint Computer Conference, 1957: 188-198.
    [11] Using and porting the GNU compiler collection, http://gcc.gnu.org
    [12] Hans-Peter Nilsson. Porting the GNU C compiler to the CRIS architecure [D]. Lund Institute of Technology, 1998.
    [13] Harsh Mehta. A GCC-based cross compiler for single instruction issue, in-order processors [D]. Kanpur, India: Indian Institute of Technology, 2004.
    [14] Hans-Peter Nilsson. Porting GCC for Dunces. ftp.axis.se/pub/users/hp/pgccfd/pgccfd-0.5.pdf
    [15] Robin Miyagi. Introduction to GCC inline asm. http://linuxassembly.org/articles/rmiyagi-inline-asm.txt
    
    [16] Richard M. Stallman. GNU compiler collection internals, 2002. http://gcc.gnu.org
    
    [17] The SUIF system programmer's reference manual. http://suif.stanford.edu/suif/suif2/doc-2.2.0-4/
    
    [18] The SUIF interfaces guide. http://suif.stanford.edu/suif/suif2/
    
    [19] The SUIF programs and passes guide, http://suif.stanford.edu/suif/suif2
    
    [20] Mary W. Hall, Saman P. Amarasinghe, Brian R. Murphy, Shih-Wei Liao, Monica S. Lam. Interprocedural parallelization analysis in SUIF [J]// ACM Transactions on Programming Languages and Systems (TOPLAS), Volume 27 Issue 4, July 2005.
    
    [21] Steven Carroll, Constantine Polychronopoulos. Compilers I: A framework for incremental extensible compiler construction [C]// Proceedings of the 17th Annual International Conference on Supercomputing ICS '03, June 2003.
    
    [22] Andrew A, Jack D, Norman R. The Zephyr compiler infrastructure. http://www.cs .Virginia, edu/zephvr
    [23] Wang D. C, Appel A. W., Korn J. L., Serra C. S. The Zephyr abstract syntax description language [C]// Proceedings of the Conference on Domain-specific Languaeges DSL'97, USENIX Association, 1997:11 -26.
    [24] Norman Ramsey and Jack W. Davidson. Machine descriptions to build tools for embedded systems [C]// In ACM SIGPLAN Workshop on Languages, Compilers, and Tools for Embedded Systems (LCTES'98), June 1998. 172-188.
    [25] Norman Ramsey and Jack W. Davidson. Specifying instructions' semantics using λ-RTL [R]// University of Virginia, July 1999.
    [26] Sungjoon Jung, Yunheung Paek. Compilers and Optimization: The very portable optimizer for digital signal processors [C]// Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems CASES '01, November 2001.
    
    [27] Dai Guilan, Zhang Suqing, Tian Jinlan, Jiang Weidu. Technical correspondence: A study of compiler techniques for multiple targets in compiler infrastructures [C]// Proceedings of the 2001 international conference on Compilers, architecture, and synthesis for embedded systems CASES '01, November 2001.
    [28] Robert Kidd. Using OpenlMPACT [R]// Beijing: Tsinghua University, October 2004.
    [29] Wen-mei Hwu. Open IMPACT [R]// University of Illinois, http://www.gelato.uiuc.edu
    [30] Christopher John Shannon. The IMPACT SC140 code generator [D]. Urbana-Champaign: University of Illinois, 2002.
    
    [31] Pohua P. Chang, Scott A. Mahlke, William Y. Chen, Nancy J. Warter, Wen-mei W. Hwu. IMPACT: an architectural framework for multiple-instruction-issue processors [C]// ACM SIGARCH Computer Architecture News, Proceedings of the 18th annual international symposium on Computer architecture ISCA '91. 1991, 19(3).
    [32] Arch D. Robison. Impact of economics on compiler optimization [C]// Proceedings of the 2001 joint ACM-ISCOPE conference on Java Grande JGI '01, June 2001.
    
    [33] Anup Gangwar, M. Balakrishnan, Anshul Kumar. Impact of intercluster communication mechanisms on ILP in clustered VLIW architectures [J]// ACM Transactions on Design Automation of Electronic Systems (TODAES), 2007(12) 1.
    [34] Christopher W. Fraser and David R. Hanson. A Retarget C Compiler Design and Implementation. Person Education, Inc. 1995.
    [35] David Dowty, Adriane Boyd. Introduction to linux LCC tutorial. www.ling.ohio-state.edu/events/lcc/tutorials/2005-2006/linux-tutorial.intro-hdt.pdf.
    [36] C. W. Fraser and R. R. Henry, BURG - fast optimal instruction selection and tree parsing [J]// SIGPLAN Notices, 1992 vol.27: 491-516.
    [37] Wei Qin.Porting Icc to the PLX architecture 1. Introduction [R]// www.palms.ee.princeton.edu/PLX/
    [38] Wei Qin. Porting LCC to the PLX architecture [R]// www.palms.ee.princeton.edu/PLX/
    [39] C. W. Fraser and D. R. Hanson. A code generation interface for ANSI C [J]// Software-Practice and Experience, 1991 (21)9: 963-988.
    [40] Chris Fraser, Dave Hanson. A lean retargetable C compiler [R]// www.cs.princeton.edu/software/lcc/
    [41] C. W. Fraser and D. R. Hanson. Simple register spilling in a retargetable compiler [J]// Software-Practice and Experience, 1992 (22)1: 85-99.
    [42] Rainer Leupers.Compiler Design Issues for Embedded Processors [C]// IEEE Design & Test of Computers, 2002.
    [43] Rainer Leupers. Embedded tutorial: Code generation for embedded processors [C]// Proceedings of the 13th international symposium on System synthesis ISSS '00, Sep 2000.
    
    [44] Peter Marwedel. Compiler for embedded processors, www.cs.dortmund.edu
    [45] Steven S. Muchnick. Advanced compiler design implementation [M]// Academic Press & China Machine Press. 2003:481-529
    [46] Jian Yang. The design of Audio and Video Decoding oriented SOC platform [D]. Hangzhou, China: 2007.
    [47] Liao Yin. SPOCK instruction manual, 2005.
    [48] SPOCK assembly language writing convention and ABI(Application Binary Interface).
    
    [49] Eilert Johan, Jian Yang. SPOCK architecture spec, Nov 2005.
    [50] SPOCK compiler manual, 2006.
    
    [51] CHAITIN G. J. Register Allocation and Spilling via Graph Coloring [C]// ACM SIGPLAN Notices, 17(6):98-105, June 1982.
    
    [52] BRIGGS P., COOPER K D and TOREZON L. Improvements to Graph Coloring Register Allocation [C]// ACM Transactions on Programming Languages and Systems (TOPLAS), 16(3):428-455, May 1994.
    
    [53] LIAO S, DEVADAS S, KEUTZER K, et al. Code optimization techniques for embedded DSP microprocessors[C]//Design Automation Conference (DAC), 1995: 599-604
    
    [54] LEUPERS R. Code Optimization Techniques for Embedded Processors [M]// Norwell: Kluwer Academic Publishers, 2000.
    
    [55] KUDRIAVTSEV A, KOGGE P. Generation of Permutations for SIMD Processors [C]// Proceedings of the 2005 ACM SIGPLAN/SIGBED, Conference on Languages, Compilers, and Tools for Embedded Systems, 2005:147-156
    [56] Cook S. A. The complexity of theorem proving procedures [C]// Proc. 3rd Annual ACM Symposium on Theory of Computing: 51-158, May 1971
    [57] Karp R. M. Reducibility among Combinatorial Problems [C]// Complexity of Computer Computations, New York: Plenum Press, 1972: 85-103.
    [58] J. Bruno, R. Sethi. Register Allocation for a One-Register Machine [R]// Technical Report No. 157, Computer Science Dept, Pennsylvania State University, October 3,1974.
    [59] Sethi R. and Ullman J. D. The generation of optimal code for arithmetic expressions [J]// Journal of the ACM (JACM), 1970(17):715-728.
    [60] B. Prabhala and R. Sethi. Efficient computation of expressions with common subexpressions [J]//Journal of the ACM, January 1980, 27(1):146-163.
    [61] J. L. Bruno and R. Sethi. The generation of optimal code for stack machines [J]// Journal of the ACM, July 1975,22(3):382-396.
    
    [62] Chang C. M, Chen C. M. and King C. T. Using integer linear programming for instruction scheduling and register allocation in multi-issue processors [J]// Computers Mathematics and Application, 1997,34(9):1-14.
    [63] Aho A. V. and Johnson S. C. Optimal code generation for expression trees [J]// Journal of the ACM, July 1976,23(3): 488-501.
    
    [64] Andrzej Bednarski and Christoph Kessler. Integer linear programming versus dynamic programming for optimal integrated VLIW code generation [C]// In Proc. 12th Int. Workshop on Compilers for Parallel Computers (CPC'06), January 2006.
    
    [65] E. A. Lee. Programmable DSP architectures: Part I [J]// IEEE ASSP Magazine, October 1988: 4-19.
    
    [66] E. A. Lee. Programmable DSP architectures: Part II [J]// IEEE ASSP Magazine, January 1989: 4-14.
    [67] Marwedel and Goosens. Code Generation for Embedded Processors [C]// Kluwer Academic Publishers, Massachusetts, 1995.
    [68] Rainer Leupers.Code Generation for Embedded Processors [C]// IEEE Proc of the 13th Int'l Symp on System Synthesis (ISSS'00), 2000: 173-178.
    [69] Hu Weiping. Key technologies in extensible compiler architecture [D]. Academia Sinica: Institute of Computing, 1998.
    [70] Steven S. Muchnick. Advanced compiler design implementation [M]// Academic Press & China Machine Press, 2003:67-102.
    [71] WHIRL Intermediate Language Specification [R]// Technical Report, Silicon Graphics Inc., 1999.
    [72] J. Ferrante, K. J. Ottenstein and J. D. Wanrren. The program dependence graph and its use in optimization [J]// ACM Transaction on Programming Languages and Systems, July 1987, 9(3):319-349.
    
    [73] D. Weise, R. F. Crew, M. Ernst and B. Steensgarrd. Value dependence graphs: representation without taxation [C]// Proceeding of the 21st ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, January 1994: 297-310.
    
    [74] E. Ruf. Optimizing sparse representations for dataflow analysis [C]// In Proceedings of the ACM SIGPLAN Workshop on Intermediate Representations, January 1995:50-61.
    [75] Qin Zhao. Static resource models for code generationof embedded processors [D]. Eindhoven: Technishe Universiteit, 2003.
    
    [76] M. Bekooij, B. Mesman, J. L. van Meerbergen, and J. A. G. Jess. Constraint analysis for operation assignment in FACTS [C]// In Proceedings of the 11st ProR- ISC/IEEE Benelux Workshop on Circuits, Systems and Signal Processing, Utrecht, The Netherlands, STW Technology Found, November 2000: 229-236,
    [77] D.C. Ku and G. De Micheli. High-Level Synthesis of ASICs under Timing and SynchronizationConstraints [D]// Kluwer Academic Publishers, Dordrecht, 1992.
    [78] W. P. M. Nuijten. Time and resource constrained scheduling [D]// Eindhoven University of Technology, 1994.
    [79] K. Kuchcinski] K. Kuchcinski. Embedded system synthesis by timing constraints solving [C]// Proc. Int. Symp. On System Synthesis, 1999:50-57.
    [80] Guido Costa Souza de Araujo. Code generation algorithms for digital signal processors [D]// Princeton, NJ, USA: Princeton University, 1997.
    [81] A. V. Aho and S. C. Johnson. Optimal Code Generation for Expression Trees [J]// Journal of the ACM, 23(3), July 1976:488-501.
    
    [82] Todd A. Proebsting. Least-Cost Instruction Selection for DAGs is NP-Complete, http://www.research.microsoft.com/toddpro/papers/proof.htm
    [83] P. Marwedel. Tree-based mapping of algorithms to predefined structures [C]// Int.Conf. on Computer-Aided Design, 1993: 586-593.
    [84] Alfred V. Aho, Mahadevan Ganapathi and Steven W. K. Tjiang. Code generation using tree matching and dynamic programming [J]// ACM Transactions on Programming Languages and Systems (TOPLAS), 11(4), October 1989: 491 — 516.
    [85] A. V. Aho, S. C. Johnson and J. D. Ullman. Code generation for expressions with common subexpressions [C]// Proceedings of the 3rd ACM SIGACT-SIGPLAN symposium on Principles on programming languages, 1976:19-31.
    [86] Bruno J, Sethi R. Register allocation for an one-register machine [R]. Pennsylvania: Pennsylvania State University, 1974.
    [87] Chaiton G J. Register allocation and spilling via graph coloring [C]// ACM SIGPLAN Notices, [s.l.]: ACM, 1982, 17(6):98-105.
    [88] BRIGGS P, COOPER K D, TOREZON L. Improvements to graph coloring register allocation [J]. ACM Transactions on Programming Languages and Systems, May 1994, 16(3):428-455.
    [89] Kudriavtsev A, Kogge P. Generation of permutations for SIMD processors [C]// Proceedings of the 2005 ACM SIGPLAN/SIGBED conference on Languages, Compilers, and Tools for Embedded Systems. [S.l.], [s.n], 2005:147-156
    [90] Briggs P. Register allocation via graph coloring [D]. Houston, USA: RICE University, 1992:23-32
    [91] Liao S, Devadas S, Keutzer K, et al. Code optimization techniques for embedded DSP microprocessors[C]// Design Automation Conference. [S.l.]: [s.n], 1995: 599-604.
    [92] Leupers R. Code optimization techniques for embedded processors[M]. Norwell: Kluwer Academic Publishers, 2000.
    [93] Pereira F. M., Palsberg J. Register allocation via coloring of Chordal graphs[C]// Proceedings of the 3rd Asian Symposium on Programming Languages and Systems. [S.l.], [s.n], 2005: 128-145.
    
    [94] Randy Allen, Ken kennedy. Optimizing compilers for modern architectures [M]// Elsevier Science Inc, 2001.
    [95] A. H. Timmer, M. T. J. Strik, J. L. van Meerbergen, et al. Conflict modeling and instruction scheduling in code generation for in-house DSP cores [C]// Proc. ACM/IEEE Design Automation Conference. 1995:593-598.
    [96] G Araujo, S. Malik, M. Lee. Using register-transfer paths in code generation for heterogeneous memory-register architectures [C]// Proc. 33rd Design Automation Conference. June 1996: 519-596.
    
    [97] G. Araujo, A. Sudarsanam, S. Malik. Instruction set design and optimizations for address computation in DSP processors [C]// The 9~(th) International Symposium on Systems Synthesis, November 1996:31-37.
    
    [98] Rajiv Gupta, Mary Lou Soffa, Tim Steele. Register allocation via clique separators [C]// SIGPLAN Notices, Proceedings of the ACM SIGPLAN '89 conference on programming language design and implementation, 24(7), 264-274, 1989.
    
    [99] Ken Kennedy. Analysis and register allocation for simple code structures [D]. Courant Institute, New York University, October 1971.
    
    [100] Fred C. Chow and John L. Hennessy. Register allocation by priority-based coloring [J]// SIGPLAN Notices, 19(6):222-232, June 1984.
    [101] http://www.ert.rwth-aachen.de/Proiekte/Tools/
    [102] http://www.eecs.harvard.edu/hube/research
    
    [103] Mirella Moura Moro, Zografoula Vagena, and Vassilis J. Tsotras. Tree-pattern queries on a Lightweight XML Processor [C]// In VLDB, 2005.
    [104] T. Chen, J. Lu and T. W. Ling. On Boosting Holism in XML Twig Pattern Matching using Structural Indexing Techniques [C]// In SIGMOD Conference,. 2005. 6.

© 2004-2018 中国地质图书馆版权所有 京ICP备05064691号 京公网安备11010802017129号

地址:北京市海淀区学院路29号 邮编:100083

电话:办公室:(+86 10)66554848;文献借阅、咨询服务、科技查新:66554700