Various Consequences: Hybrid Parallelism Approaches for CFD

Saturday, December 17, 2016

Hybrid Parallelism Approaches for CFD

This previous post, Plenty of Room at Exascale, focuses on one specific commercial approach to scaling CFD to large problems on heterogeneous hardware (CPU & GPU) clusters. Here's some more references I found interesting reading on this sort of approach.

Strategies

Recent progress and challenges in exploiting graphics processors in computational fluid dynamics provides some general strategies for using multiple levels of parallelism accross GPUs, CPU cores and cluster nodes based on that review of the literature:

Global memory should be arranged to coalesce read/write requests, which can improve performance by an order of magnitude (theoretically, up to 32 times: the number of threads in a warp)
Shared memory should be used for global reduction operations (e.g., summing up residual values, finding maximum values) such that only one value per block needs to be returned
Use asynchronous memory transfer, as shown by Phillips et al. and DeLeon et al. when parallelizing solvers across multiple GPUs, to limit the idle time of either the CPU or GPU.
Minimize slow CPU-GPU communication during a simulation by performing all possible calculations on the GPU.

Example Implementations

There are two example implementations on github that were used to illustrate the scaling with grid size for some simple 2D problems:

One of the interesting references from the paper mentioned above is Hybridizing S3D into an Exascale Application using OpenACC. They take an approach to use a combination of OpenACC directives for GPU processing, OpenMP directives for multi-core processing, and MPI for multi-node processing. Their three-level hybrid approach performs better than any single approach alone, and by making some clever algorithm tweaks they are able to run the same code on a node without a GPU without too much performance hit.

1 comment:

Joshua StultsSunday, December 18, 2016
Here's an example Parareal implementation in python for 3D Burger's equation on github; also the paper to go with the code.
ReplyDelete
Replies

Add comment

Various Consequences

Pages

Saturday, December 17, 2016

Hybrid Parallelism Approaches for CFD

Strategies

Example Implementations

1 comment:

Post Archive

Parts on Shapeways

Diode Gear

Topics