摘要
This work presents a general approach, coined 鈥淪py Element Method鈥?(SEM), for parallelising workloads to run on manycore processors and featuring dynamic dependencies between different data items, such as graph traversals, remeshing methods and particle simulations. In the SEM, appropriately defined entities, denominated 鈥渟py elements鈥? inspect their neighbourhood and without atomic memory operations update in parallel all the dynamic data dependencies, including those between themselves, between the original objects of the current problem and among entities of different kind. The application of the SEM to meshless simulation models obviates the use of binning algorithms relying on sorting or atomics, and concomitantly renders their implementation fully particle centric. On the NVIDIA GeForce GTX 480, an optimised particle-based fluid simulator runs 1.4-2.3 times faster when the binning aided by the state-of-the-art parallel sorter is eliminated and data updating is delegated to the SEM, and is at least 40 times faster than a highly optimised single-threaded version running on a single core of the Intel Xeon X5650 2.66 GHz.