Saturday, December 17, 2016

Hybrid Parallelism Approaches for CFD

This previous post, Plenty of Room at Exascale, focuses on one specific commercial approach to scaling CFD to large problems on heterogeneous hardware (CPU & GPU) clusters. Here's some more references I found interesting reading on this sort of approach.

Strategies

Recent progress and challenges in exploiting graphics processors in computational fluid dynamics provides some general strategies for using multiple levels of parallelism accross GPUs, CPU cores and cluster nodes based on that review of the literature:
  • Global memory should be arranged to coalesce read/write requests, which can improve performance by an order of magnitude (theoretically, up to 32 times: the number of threads in a warp)
  • Shared memory should be used for global reduction operations (e.g., summing up residual values, finding maximum values) such that only one value per block needs to be returned
  • Use asynchronous memory transfer, as shown by Phillips et al. and DeLeon et al. when parallelizing solvers across multiple GPUs, to limit the idle time of either the CPU or GPU.
  • Minimize slow CPU-GPU communication during a simulation by performing all possible calculations on the GPU.


Example Implementations


There are two example implementations on github that were used to illustrate the scaling with grid size for some simple 2D problems:

One of the interesting references from the paper mentioned above is Hybridizing S3D into an Exascale Application using OpenACC. They take an approach to use a combination of OpenACC directives for GPU processing, OpenMP directives for multi-core processing, and MPI for multi-node processing. Their three-level hybrid approach performs better than any single approach alone, and by making some clever algorithm tweaks they are able to run the same code on a node without a GPU without too much performance hit.

1 comment:

  1. Here's an example Parareal implementation in python for 3D Burger's equation on github; also the paper to go with the code.

    ReplyDelete