High-order accurate simulation of incompressible turbulent flows on many parallel GPUs of a hybrid-node supercomputer

Zolfaghari, Hadi; Becsek, Barna; Nestola, Maria; Sawyer, Will; Krause, Rolf; Obrist, Dominik (2019). High-order accurate simulation of incompressible turbulent flows on many parallel GPUs of a hybrid-node supercomputer. Computer Physics Communications, 244, pp. 132-142. Elsevier 10.1016/j.cpc.2019.06.012

[img] Text
online_version.pdf - Published Version
Restricted to registered users only
Available under License Publisher holds Copyright.

Download (1MB) | Request a copy

Turbulent incompressible flows play an important role in a broad range of natural and industrial processes. High-order direct numerical simulations are often used for resolving the spatio-temporal scales of such flows. Such high-fidelity simulations require an extensive computational layout which often results in prohibitive computational costs. Recent advances in modern computing platforms, such as GPU-powered hybrid-node supercomputers, appear to become an enabler for high-fidelity CFD at large scales. In this work, we propose methods for accelerating a distributed-memory high-order incompressible Navier–Stokes solver by using NVIDIA Pascal GPUs of a Cray XC40/50 supercomputer. Arithmetically intensive or chronically invoked routines were ported to the GPUs using CUDA C. Host-side driver routines were developed to invoke CUDA C “external” kernels from the FORTRAN legacy code. Numerical methods, for some of the most intensive operations, namely multigrid preconditioners, were modified to be suited to the SIMD standard for graphics processors. Customized unit testing was performed to ensure double-precision accuracy of GPU computations. The optimization layer maintained the memory structure of the legacy code. Post-profiling confirms that backbone distributed memory communications increase the number of dynamic CPU–GPU memory copies, which offsets a part of the computational performance. Strong scalability of the entire flow solver and of the stand-alone pressure solver has been examined on up to 512 P100 GPUs. Strong scaling efficiency decreased for higher numbers of GPUs, probably due to a less favorable communication-to-computation ratio. Weak scalability of the entire solver was tested on up to 4096 P100 GPUs for two problems of different sizes. The solver maintained nearly ideal weak scalability for the larger problem, illustrating the potential of GPUs in dealing with highly resolved flows. The GPU-enabled solver is finally deployed for the scale-resolving simulation of flow transition in the wake of a solid sphere at Re=3700, by utilizing 192 GPUs. The time-averaged pressure coefficient along the sphere surface was in good agreement with previously reported data acquired from CPU-based direct numerical simulations and experiments.

Item Type:

Journal Article (Original Article)


10 Strategic Research Centers > ARTORG Center for Biomedical Engineering Research > ARTORG Center - Cardiovascular Engineering (CVE)

Graduate School:

Graduate School for Cellular and Biomedical Sciences (GCB)

UniBE Contributor:

Zolfaghari, Hadi; Becsek, Barna Errol Mario and Obrist, Dominik


000 Computer science, knowledge & systems
500 Science > 530 Physics
600 Technology
600 Technology > 620 Engineering








Hadi Zolfaghari

Date Deposited:

08 Oct 2019 07:41

Last Modified:

23 Oct 2019 12:09

Publisher DOI:






Actions (login required)

Edit item Edit item
Provide Feedback