摘要
High-level C++ proxies for the convenient manipulation of subvectors and submatrices on OpenCL-enabled devices are introduced. It is demonstrated that the programming convenience of standard host-based code can be retained using native C++ language features only, even if massively parallel computing architectures such as graphics processing units are employed. The required modi铿乧ations of the underlying OpenCL kernels are discussed and a case study of an implementation of the QR-factorization is given. Benchmark results con铿乺m that the convenience of purely CPU-based libraries can be preserved without sacri铿乧ing performance of OpenCL-enabled devices, particularly graphics processing units.