What if?: Exploring masking of conditionals for performance portability

CG Auditorium
Jon Rood

In the GPU programming paradigm, loops are typically "hoisted" into what are hopefully perfectly nested collapsable loops. This technique usually increases the size of loop bodies along the way. This allows for maximum parallelism and work for each GPU thread which results in maximum device utilization. However, for good performance on the CPU, loops are typically "lowered" into several very simple vectorizable loop bodies. Some performance portability tools attempt to do this classical transformation automatically at compile time. With larger loop bodies for the GPU, it is easy for "if" statements to make their way into the loop bodies, which can cause detrimental effects due to thread divergence. These larger loop bodies are also what make it more difficult for compilers to vectorize loops for CPU performance. So, what if we write our code in such a way that we avoid most, if not all, if conditions? Removing if conditions typically involves evaluating conditionals into integers, computing both results of the conditional, and multiplying by the conditional. In this work, we utilize this technique both to reduce the amount of code duplication that scientific applications can typically have in multiple dimensions, and to analyze when this technique can be advantageous or detrimental to performance.

Speaker Description: 

Jon Rood is a Compuatational Scientist at the National Renewable Energy Laboratory. He currently focuses on performance engineering, performance portability, and software quality assurance for two DOE applications under the Exascale Computing Project involving modeling of wind energy and combustion energy.

Event Category: