accelerated computing; gpu; finite element assembly; kokkos; cuda; graph
Abstract :
[en] The Finite Element method is commonly used to solve PDEs. The assembly procedure can easily be made massively parallel because of the natural parallelism it exposes. Frameworks like Kokkos can help write a single performance portable assembly code. The assembly can be expected to maximise GPU usage because the number of work units involved is typically much larger than the available concurrency. However, after the assembly, boundary conditions are oftentimes applied on small work sets, thereby potentially under-utilising the GPU processors. Moreover, boundary conditions can showcase dependencies that must be observed.
In this talk, we focus on how we used GPU streams for concurrent execution of boundary condition functors in order to maximise GPU occupancy while observing their dependencies. More specifically, we will show how we used \emph{Kokkos::Graph} to achieve such a goal in a performance portable way. We will also discuss how we designed a polymorphic hierarchy of functors for applying boundary conditions (Dirichlet, Neumann, periodic, ...) and how we map the device polymorphic functors to nodes in the graph. The design and its performance (occupancy, impact of polymorphism, ...) will be analysed in the context of electromagnetism applications.
Disciplines :
Computer science
Author, co-author :
Arnst, Maarten ; Université de Liège - ULiège > Département d'aérospatiale et mécanique > Computational and stochastic modeling
Speaker :
Tomasetti, Romin ; Université de Liège - ULiège > Département d'aérospatiale et mécanique > Computational and stochastic modeling
Language :
English
Title :
Efficiently implementing FE boundary conditions using stream-orchestrated execution on GPU
Publication date :
07 March 2024
Event name :
SIAM Conference on Parallel Computing for Scientific Computing (PP24)