事务内存的并行优化研究

英文题名：Study on Parallel Optimizations for Transactional Memory
作者：晏志超
论文级别：博士
学科专业名称：计算机系统结构
中文关键词：事务内存系统 ; 并行编程 ; 线程级并行 ; 缓存一致性 ; 性能优化
英文关键词：Transactional Memory System ; Parallel Programming ; Thread Level Parallelism ; Cache Coherence ; Performance Optimization
学位年度：2012
导师：冯丹
学科代码：081201
学位授予单位：华中科技大学
论文提交日期：2012-10-09

摘要

随着处理器架构从单核架构转向了多核架构，给整个计算机领域带来了一系列根本性的变革。其中，首当其冲的问题就是如何保证在多核处理器下未来软件的性能能够持续增长，而这就需要程序员显式地利用线程级并行的方式来编写应用程序去挖掘程序中的并行度。但是，这种变化大大地增加程序员编程的负担，会极大地降低程序员的编程效率，尤其在软件规模越来越大的今天，提高程序员编程效率被视作现代软件开发的核心技术。事务内存将数据库中事务的概念引入到并行编程模型中，一方面采用事务的接口提高了系统的抽象层次，尤其是提供了可重组的性质，降低了程序员管理并发线程对共享数据相互竞争的负担；另一方面，对可能产生共享数据竞争的程序段赋予了事务的语义，可以乐观地并发执行这些程序段，即只有在实际中检测到事务冲突后才对其做出相应的处理，提高了程序的线程并行度。因此，事务内存被视作未来多核平台下并行编程的一个有效的工具，受到了工业界和学术界广泛的关注，成为当前的一个研究热点。
     通过分析和评估现有的硬件事务内存系统，发现系统随着并发执行事务数目的增加，特别是在具有粗粒度和高冲突特征的事务应用负载下，由于受到事务冲突的制约，导致其并发性能急剧下降，影响整个应用的性能。本文从优化事务内存系统中并发事务执行的并行度入手，提出了三种改进现有的事务内存系统并发度的关键技术。
     由于事务内存现有的版本管理方法在系统运行过程中都存在额外的数据移动的问题。额外的数据移动不仅会延长事务执行的时间，同时还会影响周围并发事务对该事务中共享数据的访问，带来额外的事务冲突。特别是在粗粒度和高冲突的事务应用环境下，有可能会导致系统出现更多的冲突，阻塞事务的并发执行，进而带来更多的数据移动操作，行成一个恶性的循环，降低系统的并发度，影响并行应用的性能。针对这个问题，本文提出了一种减少数据移动的单次数据更新版本管理方法，设计了一个全相联的硬件重定向表格并将其集成到处理器的流水线中用于管理事务执行过程中的新旧版本数据，同时分析事务应用的特点设计了两层重定向表格，在挖掘并发性能的同时降低硬件的开销。这种方法通过减少版本管理中额外的数据移动操作，除了可以缩短事务本身的执行时间以外，还可以降低该事务对周围并发事务的影响，提高整个系统的并发度。
     事务接口具备的可重组的特点使得程序员很容易地将并行事务应用程序写成粗粒度和高冲突的长事务应用程序。然而，粗粒度的长事务在执行过程中，因为其执行时间过长，导致与其发生事务冲突的概率大大的增强，很容易阻塞周围并发执行的事务，极大地影响并发执行事务的并行度。为了解决这个问题，本文提出了粗粒度长事务的并发推测执行技术，将线程级推测执行技术集成到硬件事务内存系统中来，修改硬件事务内存系统的结构使之可以支持程序顺序一致性检测，并提出在函数调用和循环程序结构中采取推测执行技术加速应用的执行。本方法从粗粒度的长事务中提取可以并发执行的程序段，利用多个线程并发地执行粗粒度的长事务，加速该长事务的执行，同时也降低了该事务同周围并发事务之间发生事务冲突的概率，提高整个系统的并发性。
     由于事务应用程序中事务的长短、包含读写集合的大小各不相同，事务之间的相互影响的行为也多种多样，使得简单地利用基于历史信息的硬件选择器不能准确地选择事务恰当的运行模式。错误的选择会带来额外的开销，同时还会影响周围事务的并发度。为了解决这一问题，本文提出基于事务行为分析的并行优化技术，设计了一个基于软件辅助的事务调度管理模块，该模块动态地收集和统计事务执行的信息，利用得到的统计信息挖掘并发事务的行为特征，并以此作为调度事务执行和解决事务冲突的重要参考因素，以此辅助硬件事务内存系统中的冲突管理模块挖掘并发执行事务的并发度。本方法相对于传统的基于硬件预测的事务内存系统可以提供更好的预测性能，并通过对行为特征的提取优化并发事务的执行，提高了事务内存系统的性能。
     本文提出的单次数据更新版本管理技术和粗粒度长事务的并发推测执行技术通过减少事务内部的数据移动和并行地执行粗粒度长事务挖掘了事务内的并行度，提高了事务内存系统的性能。同时，本文提出的基于事务行为分析的并行优化技术则进一步通过收集并发事务的行为特征，调度优化并发事务的执行，挖掘事务间的并行度。这三种并行优化方法可以结合使用，同时挖掘事务内和事务间的并行度，提高事务内存系统的性能。
A series of fundamental changes to the entire computer community have come withthe sea change from single-core to multi-core processor architecture. Among them, thefirst problem is how to ensure that the future performance of the software can continue togrow in the multi-core era, and this requires the programmer to explicitly usingthread-level-parallelism to write applications to exploit the parallelism in the program.However, this change will significantly increase the burden on the programmer, that maygreatly reduces programmer productivity, especially in the increasing scale of software,to improve programmer productivity of programming behave as a core technology of themodern software development. Transactional memory borrowed the transaction conceptfrom the database community to the parallel programming model. On the one hand, usingthe transaction interface can improve the abstraction level, especially for its nature ofcomposition, which can reduce the burden of the programmer manage concurrent threadscompeting for the shared data; on the other hand, assigning the transactional semantic tothe possibly racing blocks can optimistically run these codes, only when the transactionalconflict is detected in practice, transactional memory invokes the proper function to makethe appropriate treatment, which improves degree of parallelism of the program.Therefore, transactional memory is regarded as an effective tool for parallel programmingin the future multi-core platform, attract a lot of attention from industry and academia,thus becoming a hotspot nowadays.
     Through analysis and evaluate the existing hardware transactional memory system,we found that along with the number of concurrent executing transactions increased,especially under the coarse-grained and high-contention workloads, due to the constraintof the transaction conflict, leading to concurrent performance dramatically decline ininfluence of application’s concurrent performance. This paper concerntrates onoptimizing the parallelism of the current transactions in transactional memory system, wehave proposed three key techniques to exploit the concurrency of transactions in transactional memory system.
     Due to the extra data movements exist in the existing version management schemesof transactional memory systems. These data movements will not only extend the time oftransaction execution, but also affect the surrounding concurrent transactions to access tothe shared data, which may bring extra transactional conflicts among the concurrenttransactions. This case will worsen especially under the coarse-grained andhigh-contention transactional workloads, there may cause the system to appear moretransactional conflicts, blocking concurrent transaction execution, and bring more datamovements, thus leading to a vicious cycle to limit the concurrency and hurt theapplication’s performance. Aiming at this problem, we propose a novel single updateversion management scheme to reduce the extra data movements in the versionmanagement scheme, design a hardware-based fully-associative redirect table to managethe old and new versions during a transaction’s execution period, integrate it into theprocessor’s pipeline and analysis transactional workloads’ characterstics to design thetwo-level redirect table to exploit the parallelisms and reduce the hardware costs. Thismethod not only shortens the execution time of each transaction, but also alleviates thepossible interference with the surrounding transactions, thus efficiently improving thetransactional memory system’s concurrency.
     The composition capability of the transaction interface makes the programmersnaturally write the transactional applications composed by the coarse-grained,high-contention and long-length transactions. However, during the execution process ofeach coarse-grained and long-length transaction, because its execution time is too long,which will greatly increase the probability of transactional conflict occurrence, it is easyto obstruct the execution of concurrent transactions, thus significantly affectingsurrounding transactions’ concurrency. In order to solve this problem, we propose acoarse-grained and long-length transaction’s speculative in parallel execution technique,integrate the thread-level speculation technique into the hardware transactional memorysystem, modify the architecture of transactional memory to support checkingprogram’s.sequential ordering and apply this method to spawn speculative threads from the function call and loop structures to accelerate program’s execution. This method canextract the speculative code segments from a coarse-grained long-length transaction andassign multiple threads to execute them in parallel to accelerate the long transaction’sexecution. Moreover, this method can also reduce the probability of transactional conflictoccurrence with the surrounding transactions to improve the overall transactionalmemory system’s concurrency.
     Due to the various characterstics of the transaction’s length, read set and write set inthe concurrent transactions, the interaction of these concurrent transactions will lead tovarious behaviors, that makes using the history-based hardware transactional modeselector no longer to accurately choose the proper transactional execution mode. Thewrong choice will not only bring the extra overheads but also interfere the aroundtransactions to hurt the concurrency. To address this problem, we propose a paralleloptimization technique based on the transactions’ behaviors analysis, design asoftware-based module to manage transaction’s schedule, who will collect andstatistically analysis each transaction’s execution information, using these statistical datato mine the behavioral characterstics of the concurrent transactions, and take advantageof this information as the main factor to schedule the transactions’ execution and solvethe transactional conflicts, this auxiliary conflict management component can helpunderneath hardware exploit more concurrency of the concurent transactions. Thismethod can provide a higher predication accuracy to schedule the execution of concurrenttransactions than the hardware-based approach and exploit more performance benefits viamining the behavioral characterstics of the concurrent transactions.
     In this paper, the single update version management scheme and coarse-grained andlong-length transaction’s speculative in parallel execution technique try to reduce the datamovements and execute the coarse-grained and long-length transaction in parallel toexploit the parallelisms in the transactions. At the same time, the parallel optimizationtechnique based on the transactions’ behaviors analysis tries to collect the behavioralcharacterstics of the concurrent transactions and schedule the execution of the concurrenttransactions to exploit the parallelisms among the concurrent transactions. So, the parallel optimization technique based on the transactions’ behaviors analysis can combine withthe single update version management scheme and coarse-grained and long-lengthtransaction’s speculative in parallel execution technique to exploit both the parallelismsin the transactions and the parallelisms among the concurrent transactions, thusimproving the parallel performance of transactional memory systems.

引文

[1] Jim Gray. What next? a dozen information-technology research goals. MicrosoftTechnical Report, MS-TR-99-50, Microsoft Research,1999
    [2] Samuel H. Fuller, Luiz Andre Barroso, Robert P.Colwell, et al. The future ofcomputing performance game over or next level? National Academies ResearchReport, National Research Council of The National Academies,2011
    [3] R. H. Dennard, F. H. Gaensslen, V. L. Rideout, et al. Design of ion-implantedmosfet's with very small physical dimensions. IEEE Journal of Solid-State Circuits,1974,9(5):256~268
    [4] Krste Asanovic, Ras Bodik, Bryan Christopher Catanzaro, et al. The landscape ofparallel computing research: a view from berkeley. Technical Report,UCB/EECS-2006-183, University of California, Berkeley,2006
    [5] Tim Harris, James Larus, and Ravi Rajwar. Transactional memory.(the2nd edition).CA: Morgan&Claypool Publishers,2010.1~263
    [6] Maurice Herlihy and J. Eliot B. Moss. Transactional memory: architectural supportfor lock-free data structures. in: Proceedings of the20th annual internationalsymposium on computer architecture. San Diego, CA, USA: ACM,1993.289~300
    [7] Shailender Chaudhry. Rock: A third generation65nm,16-core,32thread+32scout-threads cmt sparc processor. in: Proceedings of the20th symposium on highperformance chips. Stanford, CA, USA: ACM,2008
    [8] Advanced synchronization facility proposed architectural specification.(2.1edition). Advanced Micro Devices Corporation,2009
    [9] R. A. Haring, M. Ohmacht, T. W. Fox, et al. The ibm blue gene/q compute chip.IEEE Micro,2012,32(2):48~60
    [10] Intel architecture instruction set extensions programming reference.319433-012.Intel Corporation,2012
    [11] D. E. Porter and E. Witchel. Understanding transactional memory performance. in:Proceedings of the10th IEEE international symposium on performance analysis ofsystems&software. White Plains, NY, USA: IEEE,2010.97~108
    [12] Gordon E. Moore. Cramming more components onto integrated circuits.Electronics,1965,38(8):114~117
    [13] R. R. Schaller. Moore's law: Past, present and future. IEEE Spectrum,1997,34(6):52~59
    [14] Laszlo B. Kish. End of moore's law: thermal (noise) death of integration in microand nano electronics. Physics Letters A,2002,305(3~4):144~149
    [15] Scott E. Thompson and Srivatsan Parthasarathy. Moore's law: the future of simicroelectronics. Materials Today,2006,9(6):20~25
    [16] David Callahan and Ken Kennedy. Analysis of interprocedural side effects in aparallel programming environment. Journal of Parallel and Distributed Computing,1988,5(5):517~550
    [17] David Skillicorn. Foundations of parallel programming.(the1st edition).Cambridge, UK: Cambridge University Press,1994.1~212
    [18] David Patterson. The trouble with multi-core. IEEE Spectrum,2010,47(7):28~32
    [19] Jesper Larsson Traff. What the parallel-processing community has (failed) to offerthe multi/many-core generation. Journal of Parallel Distributed Computing,2009,69(9):807~812
    [20] Krste Asanovic, Rastislav Bodik, James Demmel, et al. A view of the parallelcomputing landscape. ACM Commun,2009,52(10):56~67
    [21] Michael Armbrust, Armando Fox, Rean Griffith, et al. A view of cloud computing.ACM Commun,2010,53(4):50~58
    [22] Uzi Vishkin. Using simple abstraction to reinvent computing for parallelism. ACMCommun,2011,54(1):75~85
    [23] Arvind and Robert A. Iannucci. A critique of multiprocessing von neumann style. in:Proceedings of the10th annual international symposium on computer architecture.Stockholm, Sweden: ACM,1983.426~436
    [24] Jim Gray and Andreas Reuter. Transaction processing: concepts and techniques.(the1st edition). CA, USA: Morgan Kaufmann Publishers,1992.1~1070
    [25] Joel Coburn, Adrian M. Caulfield, Ameen Akel, et al. Nv-heaps: making persistentobjects fast and safe with next-generation, non-volatile memories. in: Proceedingsof the16th international conference on architectural support for programminglanguages and operating systems. Newport Beach, CA, USA,2011.105~118
    [26] Rachid Guerraoui and Michal Kapalka. Principles of transactional memory.(the1stedition). CA, USA: Morgan&Claypool Publishers,2010.1~179
    [27] Christopher J. Rossbach, Owen S. Hofmann, and Emmett Witchel. Is transactionalprogramming actually easier? in: Proceedings of the15th ACM SIGPLANsymposium on principles and practice of parallel programming. Bangalore, India,ACM,2010.47~56
    [28] Lance Hammond, Brian D. Carlstrom, Vicky Wong, et al. Programming withtransactional coherence and consistency (tcc). in: Proceedings of the11thinternational conference on architectural support for programming languages andoperating systems. Boston, MA, USA, ACM,2004.1~13
    [29] A. McDonald, Chung JaeWoong, H. Chafi, et al. Characterization of tcc onchip-multiprocessors. in: Proceedings of the14th international conference onparallel architectures and compilation techniques. Saint Louis, MO, USA, ACM,2005.63~74
    [30] Hany E. Ramadan, Christopher J. Rossbach, Donald E. Porter, et al. Metatm/txlinux:Transactional memory for an operating system. in: Proceedings of the34th annualinternational symposium on Computer architecture. San Diego, CA, USA, ACM,2007.92~103
    [31] Donald E. Porter, Owen S. Hofmann, Christopher J. Rossbach, et al. Operatingsystem transactions. in: Proceedings of the ACM SIGOPS22nd symposium onoperating systems principles. Big Sky, MT, USA, ACM,2009.161~176
    [32] Ali-Reza Adl-Tabatabai, Brian T. Lewis, Vijay Menon, et al. Compiler and runtimesupport for efficient software transactional memory. in: Proceedings of the2006ACM SIGPLAN conference on programming language design and implementation.Ottawa, Ontario, Canada, ACM,2006.26~37
    [33] Sandya Mannarswamy, Dhruva R. Chakrabarti, Kaushik Rajan, et al. Compileraided selective lock assignment for improving the performance of softwaretransactional memory. in: Proceedings of the15th ACM SIGPLAN symposium onprinciples and practice of parallel programming. Bangalore, India, ACM,2010.37~46
    [34] Dave Christie, Jae-Woong Chung, Stephan Diestelhorst, et al. Evaluation of amd'sadvanced synchronization facility within a complete transactional memory stack. in:Proceedings of the5th european conference on computer systems. Paris, France,ACM,2010.27~40
    [35] Justin E. Gottschlich and Daniel A. Connors. Dracostm: A practical c++approachto software transactional memory. in: Proceedings of the2007symposium onlibrary-centric software design. Montreal, Canada, ACM,2007.52~66
    [36] Cliff Click. Azul's experiences with hardware transactional memory. in:Proceedings of the bay area workshop on transactional memory. CA, USA.2009
    [37] John L. Hennessy and David A. Patterson. Computer architecture: a quantitativeapproach.(the5th edition), CA, USA: Morgan Kaufmann Publishers,2011.1~856
    [38] K. E. Moore, J. Bobba, M. J. Moravan, et al. Logtm: Log-based transactionalmemory. in: Proceedings of the12th international symposium on high performancecomputer architecture. Austin, TX, USA, IEEE,2006.254~265
    [39] Tom Knight. An architecture for mostly functional languages. in: Proceedings ofthe1986ACM conference on LISP and functional programming. Cambridge, MA,USA,1986.105~112
    [40] Gurindar S. Sohi, Scott E. Breach, and T. N. Vijaykumar. Multiscalar processors. in:Proceedings of the22nd annual international symposium on computer architecture.S. Margherita Ligure, Italy,1995.414~425
    [41] Ali-Reza Adl-Tabatabai, Tatiana Shpeisman, and Justin Gottschlich. Draftspecification of transactional language constructs for c++.(version1.1). IntelCorporation.2011
    [42] Top500supercomputer sites.2012; Available from: http://www.top500.org/.
    [43] Nir Shavit and Dan Touitou. Software transactional memory. in: Proceedings of the14th annual ACM symposium on principles of distributed computing. Ottowa,Ontario, Canada,1995.204~213
    [44] Sean Lie. Hardware support for unbounded transactional memory:[MA.Sc Thesis]MA, USA: Massachusetts Institute of Technology,2004.
    [45] Dan Grossman. The transactional memory/garbage collection analogy. in:Proceedings of the22nd annual ACM SIGPLAN conference on object-orientedprogramming systems and applications. Montreal, Quebec, Canada,2007.695~706
    [46] Calin Cascaval, Colin Blundell, Maged Michael, et al. Software transactionalmemory: Why is it only a research toy? Queue.2008,6(5):46~58
    [47] Aleksandar Dragojevic, Pascal Felber, Vincent Gramoli, et al. Why stm can bemore than a research toy. Commun. ACM.2011,54(4):70~77
    [48] Ravi Rajwar and James R. Goodman. Speculative lock elision: enabling highlyconcurrent multithreaded execution. in: Proceedings of the34th annual ACM/IEEEinternational symposium on microarchitecture. Austin, TX, USA,2001.294~305
    [49] Ravi Rajwar and James R. Goodman. Transactional lock-free execution oflock-based programs. in: Proceedings of the10th international conference onarchitectural support for programming languages and operating systems. San Jose,CA, USA,2002.5~17
    [50] Lance Hammond, Vicky Wong, Mike Chen, et al. Transactional memory coherenceand consistency. in: Proceedings of the31st annual international symposium oncomputer architecture. Munchen, Germany,2004.102~113
    [51] Lance Hammond, Brian D. Carlstrom, Vicky Wong, et al. Transactional coherenceand consistency: simplifying parallel hardware and software. IEEE Micro,2004,24(6):92~103
    [52] Arvind, Krste Asanovic, Derek Chiou, et al. Ramp: research accelerator formultiple processors-a community vision for a shared experimental parallel hw/swplatform. Technical Report, UCB/CSD-05-1412, University of California, Berkeley,2005
    [53] N. Njoroge, J. Casper, S. Wee, et al. Atlas: A chip-multiprocessor with transactionalmemory support. in: Proceedings of the conference on design, automation and testin europe. San Jose, CA, USA,2007.3~8
    [54] L. Yen, J. Bobba, M. R. Marty, et al. Logtm-se: decoupling hardware transactionalmemory from caches. in: Proceedings of the IEEE13th international symposium onhigh performance computer architecture. Phoenix, AZ, USA,2007.261~272
    [55] Jayaram Bobba, Neelam Goyal, Mark D. Hill, et al. Tokentm: efficient execution oflarge transactions with hardware transactional memory. in: Proceedings of the35thannual international symposium on computer architecture. Beijing, China, IEEE,2008.127~138
    [56] Burton H. Bloom. Space/time trade-offs in hash coding with allowable errors.Commun. ACM.1970,13(7):422~426
    [57] L. Ceze, J. Tuck, C. Cascaval, et al. Bulk disambiguation of speculative threads inmultiprocessors. in: Proceedings of the33rd international symposium on computerarchitecture. Boston, MA, USA,2006.227~238
    [58] Steven Cameron Woo, Moriyoshi Ohara, Evan Torrie, et al. The splash-2programs:characterization and methodological considerations. in: Proceedings of the22ndannual international symposium on computer architecture. S. Margherita Ligure,Italy,1995.24~36
    [59] Minh Chi Cao, Chung JaeWoong, C. Kozyrakis, et al. Stamp: Stanford transactionalapplications for multi-processing. in: Proceedings of the4th IEEE internationalsymposium on workload characterization. Seattle, WA, USA,2008.35~46
    [60] Mohammad Ansari, Christos Kotselidis, Ian Watson, et al. Lee-tm: a non-trivialbenchmark suite for transactional memory. in: Proceedings of the8th internationalconference on algorithms and architectures for parallel processing. Agia Napa,Cyprus,2008.196~207
    [61] J. Poe, C. Hughes, and Li Tao. Transplant: A parameterized methodology forgenerating transactional memory workloads. in: Proceedings of the IEEEinternational symposium on modeling, analysis&simulation of computer andtelecommunication systems. London, UK,2009.1~10
    [62] Gokcen Kestor, Vasileios Karakostas, Osman S. Unsal, et al. Rms-tm: Acomprehensive benchmark suite for transactional memory systems. in: Proceedingsof the2nd joint WOSP/SIPEW international conference on performanceengineering. Karlsruhe, Germany,2011.335~346
    [63]. W. Chung, H. Chafi, C. C. Minh, et al. The common case transactional behavior ofmultithreaded programs. in: Proceedings of the12th international symposium onhigh performance computer architecture. Austin, TX, USA,2006.266~277
    [64] C. S. Ananian, K. Asanovic, B. C. Kuszmaul, et al. Unbounded transactionalmemory. in: Proceedings of the11th international symposium on high performancecomputer architecture. San Francisco, CA, USA,2005.316~327
    [65] Michelle J. Moravan, Jayaram Bobba, Kevin E. Moore, et al. Supporting nestedtransactional memory in logtm. in: Proceedings of the12th international conferenceon architectural support for programming languages and operating systems. SanJose, CA, USA,2006.359~370
    [66] R. Rajwar, M. Herlihy, and K. Lai. Virtualizing transactional memory. in:Proceedings of the32nd international symposium on computer architecture.Madison, WI, USA,2005.494~505
    [67] H. E. Ramadan, C. J. Rossbach, and E. Witchel. Dependence-aware transactionalmemory for increased concurrency. in: Proceedings of the41st IEEE/ACMinternational symposium on microarchitecture. Lake Como, Italy,2008.246~257
    [68] Colin Blundell, Joe Devietti, E. Christopher Lewis, et al. Making the fast casecommon and the uncommon case simple in unbounded transactional memory. in:Proceedings of the34th annual international symposium on computer architecture.San Diego, CA, USA, ACM,2007.24~34
    [69] Arrvindh Shriraman, Michael F. Spear, Hemayet Hossain, et al. An integratedhardware-software approach to flexible transactional memory. in: Proceedings ofthe34th annual international symposium on computer architecture. San Diego, CA,USA,2007.104~115
    [70] Marc Lupon, Grigorios Magklis, and Antonio Gonzalez. A dynamically adaptablehardware transactional memory. in: Proceedings of the43rd annual IEEE/ACMinternational symposium on microarchitecture. Atlanta, GA, USA,2010.27~38
    [71] Sasa Tomic, Cristian Perfumo, Chinmay Kulkarni, et al. Eazyhtm: eager-lazyhardware transactional memory. in: Proceedings of the42nd annual IEEE/ACMinternational symposium on microarchitecture. New York, NY, USA,2009.145~155
    [72] Geoffrey Blake, Ronald G. Dreslinski, and Trevor Mudge. Proactive transactionscheduling for contention management. in: Proceedings of the42nd annualIEEE/ACM international symposium on microarchitecture. New York, NY, USA,2009.156~167
    [73] G. Blake, R. G. Dreslinski, and T. Mudge. Bloom filter guided transactionscheduling. in: Proceedings of the17th international symposium on highperformance computer architecture. San Antonio, TX, USA,2011.75~86
    [74] Utku Aydonat and Tarek S. Abdelrahman. Hardware support for relaxedconcurrency control in transactional memory. in: Proceedings of the43rd annualIEEE/ACM international symposium on microarchitecture. Atlanta, GA, USA,2010.15~26
    [75] Lihang Zhao, Woojin Choi, and Jeff Draper. Sel-tm: selective eager-lazymanagement for improved concurrency in transactional memory. in: Proceedings ofthe26th international parallel&distributed processing symposium. Shanghai,China, IEEE,2012.95~106
    [76] Jayaram Bobba, Kevin E. Moore, Haris Volos, et al. Performance pathologies inhardware transactional memory. in: Proceedings of the34th annual internationalsymposium on computer architecture. San Diego, CA, USA,2007.81~91
    [77] Marc Lupon, Grigorios Magklis, Antonio Gonzalez. Version managementalternatives for hardware transactional memory. in: Proceedings of the9thworkshop on memory performance: dealing with applications, systems andarchitecture. Toronto, Canada,2008.69~76
    [78] M. Lupon, G. Magklis, and A. Gonzalez. Fastm: A log-based hardwaretransactional memory with fast abort recovery. in: Proceedings of the18thinternational conference on parallel architectures and compilation techniques.Raleigh, NC, USA,2009.293~302
    [79] A. Negi, R. Titos-Gil, M. E. Acacio, et al. Eager meets lazy: the impact ofwrite-buffering on hardware transactional memory. in: Proceedings of the40thinternational conference on parallel processing. Taipei, Taiwan, China, IEEE,2011.73~82
    [80] A. Armejach, A. Seyedi, R. Titos-Gil, et al. Using a reconfigurable l1data cache forefficient version management in hardware transactional memory. in: Proceedings ofthe20th international conference on parallel architectures and compilationtechniques. Galveston Island, TX, USA, ACM,2011.361~371
    [81] N. Muralimanohar, R. Balasubramonian, and N. P. Jouppi. Architecting efficientinterconnects for large caches with cacti6.0. IEEE Micro.2008,28(1):69~79
    [82] Milo M. K. Martin, Daniel J. Sorin, Bradford M. Beckmann, et al. Multifacet'sgeneral execution-driven multiprocessor simulator (gems) toolset. SIGARCHComput. Archit. News.2005,33(4):92~99
    [83] Peter S. Magnusson, Magnus Christensson, Jesper Eskilson, et al. Simics: A fullsystem simulation platform. Computer.2002,35(2):50~58
    [84] Vladimir Gajinov, Ferad Zyulkyarov, Osman S. Unsal, et al. Quaketm: parallelizinga complex sequential application using transactional memory. in: Proceedings ofthe23rd international conference on supercomputing. Yorktown Heights, NY, USA,2009.126~135
    [85] A. McDonald, Chung JaeWoong, B. D. Carlstrom, et al. Architectural semantics forpractical transactional memory. in: Proceedings of the33rd internationalsymposium on computer architecture. Boston, MA, USA, IEEE,2006.53~65
    [86] J. Gregory Steffan, Christopher Colohan, Antonia Zhai, et al. The stampedeapproach to thread-level speculation. ACM Trans. Comput. Syst..2005,23(3):253~300
    [87] Venkata Krishnan and Josep Torrellas. A chip-multiprocessor architecture withspeculative multithreading. IEEE Trans. Comput..1999,48(9):866~880
    [88] Il Park, Babak Falsafi, and T. N. Vijaykumar. Implicitly-multithreaded processors.in: Proceedings of the30th annual international symposium on computerarchitecture. San Diego, CA, USA,2003.39~51
    [89] M. J. Garzaran, M. Prvulovic, J. M. Llaberia, et al. Tradeoffs in buffering memorystate for thread-level speculation in multiprocessors. in: Proceedings of the9thinternational symposium on high performance computer architecture. Anaheim, CA,USA, IEEE,2003.191~202
    [90] C. B. Colohan, A. Ailamaki, J. G. Steffan, et al. Tolerating dependences betweenlarge speculative threads via sub-threads. in: Proceedings of the33rd internationalsymposium on computer architecture. Boston, MA, USA, IEEE,2006.216~226
    [91] Guo Rui, An Hong, Dou Ruiling, et al. Logspotm: a scalable thread levelspeculation model based on transactional memory. in: Proceedings of the13thasia-pacific computer systems architecture conference. Hsinchu, Taiwan,2008.1~8
    [92] L. Porter, Choi Bumyong, and D. M. Tullsen. Mapping out a path from hardwaretransactional memory to speculative multithreading. in: Proceedings of the18thinternational conference on parallel architectures and compilation techniques.Raleigh, NC, USA,2009.313~324
    [93] C. Hughes, J. Poe, A. Qouneh, et al. On the (dis)similarity of transactional memoryworkloads. in: Proceedings of the5th IEEE international symposium on workloadcharacterization. Austin, TX, USA, IEEE,2009.108~117
    [94] Walther Maldonado, Patrick Marlier, Pascal Felber, et al. Scheduling support fortransactional memory contention management. in: Proceedings of the15th ACMSIGPLAN symposium on principles and practice of parallel programming.Bangalore, India, ACM,2010.79~90
    [95] Aleksandar Dragojevic, Rachid Guerraoui, Anmol V. Singh, et al. Preventing versuscuring: Avoiding conflicts in transactional memories. in: Proceedings of the28thACM symposium on principles of distributed computing. Calgary, Canada, ACM,2009.7~16