
Figure 1
UML diagram of the simulation object (sim) for the solver fluidsim.solvers.ns3d. Each block represents an object or instance of a class, and the object name and the class name are written as headings. The solid arrows show how objects are associated with each other. Methods and variables of significance to the user are displayed in the body of each object block.
Table 1
Specifications of the supercomputing cluster and software used for profiling and benchmarking.
| Cluster | Beskow (Cray XC40 system with Aries interconnect) |
| CPU | Intel Xeon CPU E5–2695v4, 2.1GHz |
| Operating System | SUSE Linux Enterprise Server 11, Linux Kernel 3.0.101 |
| No. of cores per nodes used | 32 |
| Maximum no. of nodes used | 32 (2D cases), 256 (3D cases) |
| Compilers | CPython 3.6.5, Intel C++ Compiler (icpc) 18.0.0 |
| Python packages | fluiddyn 0.2.3, fluidfft 0.2.3, fluidsim 0.2.1, numpy (OpenBLAS) 1.14.2, Cython 0.28.1, mpi4py 3.0.0, pythran 0.8.5 |

Figure 2
Profiling analysis of the 2D Navier-Stokes (fluidsim.solvers.ns2d) solver using a grid sized 1024 × 1024 (a) in sequential with fft2d.with_fftw1d operator and (b) with 8 processes with fft2d.mpi_with_fftwmpi2d operator.

Figure 3
Profiling analysis of the 3D Navier-Stokes (fluidsim.solvers.ns3d) solver. Top row: grid sized 128 × 128 × 128 solved (a) sequentially using fft3d.with_fftw3d operator and (b) with 8 processes using fft3d.mpi_with_fftwmpi3d operator. Bottom row: grid sized 512 × 512 × 512 using fft3d.mpi_with_fftwmpi3d operator (c) with 2 processes and (d) with 128 processes.
Table 2
Elapsed times (in seconds) for twenty time steps of 1024 × 1024 case with the 2D Navier-Stokes 2D (fluidsim.solvers.ns2d) solver.
| FFT class | Time (np = 1) | FFT class | Time (np = 2) |
|---|---|---|---|
| fft2d.with_fftw1d | 6.63 | fft2d.mpi_with_fftw1d | 7.63 |
| fft2d.with_fftw2d | 5.59 | fft2d.mpi_with_fftwmpi2d | 3.91 |

Figure 4
Strong scaling benchmarks of the 2D Navier-Stokes (fluidsim.solvers.ns2d) solver. The number of cores np goes from 2 to 210 = 1024. Crosses and dots correspond to 1024 × 1024 and 2048 × 2048 grid points, respectively.

Figure 5
Strong scaling benchmarks of the 3D Navier-Stokes (fluidsim.solvers.ns3d) solver in Beskow. The number of cores np goes from 25 = 32 to 213 = 8192. Crosses and dots correspond to 1283 and 10243 grid points, respectively.
Table 3
Elapsed times (in seconds) for ten RK4 time steps for two bidimensional cases and the four CFD codes.
| fluidsim | Dedalus | SpectralDNS | NS3D | |
|---|---|---|---|---|
| 5122 | 0.51 | 1.53 | 0.92 | 0.82 |
| 10242 | 2.61 | 8.00 | 3.48 | 3.96 |

Figure 6
Comparison of the execution times for a 3D case (1283, 10 time steps) between NS3D (blue bars) and fluidsim.solvers.ns3d (yellow bars). The first two bars correspond to the total time and the others to the main tasks in terms of time consumption, namely FFT, Runge-Kutta 4, curl, vector product and “projection”.
