Graphics Processing Units (GPUs) and other parallel devices are widely available and have the potential for accelerating a wide class of algorithms. However, expert programming skills are required to achieve maximum performance. These devices expose low-level hardware details through imperative programming interfaces which inevitably results in non-performance-portable programs highly tuned for a specific device.
Functional programming models have recently seen a renaissance in the systems community as they offer a solution to tackle the performance portability challenge. Recent work has shown that a functional representation can be effectively used to automatically derive and encode low-level hardware-specific parallelism strategies resulting in high performance across a range of target architectures. However, the translation of the functional representation to the actual imperative program expected by the hardware interface is typically ad hoc with no correctness guarantees.
In this paper, we present a formalised strategy-preserving translation from high level functional code to low level data race free parallel imperative code. This translation is formulated and proved correct within a language we call Data Parallel Idealised Algol (DPIA), a dialect of Reynolds' Idealised Algol. Performance results on GPUs and a multicore CPU show that the formalised translation process leads to performance on a par with ad hoc approaches.