
Figure 1
Class hierarchy demonstrating object-oriented approach. The sequential classes are shown in red, the CUDA-based classes in magenta and the MPI-based classes in green. The arrows represent inheritance from parent to child class.

Figure 2
Flowchart illustrating how the C++ and Python API are built and used for one particular class, viz. FFT2DMPIWithFFTWMPI2D. The dotted arrows in C++ part stand for include statements, demonstrating the class hierarchy and in the Python part indicate how different codes are imported. On the bottom, a smaller flowchart demonstrates how to use the API by writing user code.

Figure 3
Speedup computed from the median of the elapsed times for 3D fft (384 × 1152 × 1152, left: fft and right: ifft) on Occigen.

Figure 4
Speedup computed from the median of the elapsed times for 3D fft (1152 × 1152 × 1152, left: fft and right: ifft) on Occigen.

Figure 5
Speedup computed from the median of the elapsed times for 3D fft (384 × 1152 × 1152, left: fft and right: ifft) on Beskow.

Figure 6
Speedup computed from the median of the elapsed times for 3D fft (1152 × 1152 × 1152, left: fft and right: ifft) on Beskow.

Figure 7
Speedup computed from the median of the elapsed times for 3D fft (320 × 640 × 640) at LEGI on cluster8.

Figure 8
Speedup computed from the median of the elapsed times for 2D fft (2160 × 2160) at LEGI on cluster8.

Figure 9
Elapsed time (smaller is better) for the projection function for different implementations and tools. The shape of the arrays is (128, 128, 65). The dotted lines indicate the times for Fortran for better comparison.
