The company presents CP2K as an example, in this atomic and molecular simulation the use of Hyper-Q leads to a 2.5x speedup without extra coding effort.
This small data set of 864 water molecules is usually problematic for GPUs. Without Hyper-Q, only one MPI process runs on each node with GPUs, and the performance curve from 1 to 16 nodes is not much better than with CPU-only simulations.
With Hyper-Q, it is now possible to use the same number of MPI processes per node as in the CPU-only case, which means 16 MPI processes per GPU in this instance. This unlocks the full benefit of the GPU, leading to a speedup of 2.5x with Hyper-Q enabled.
And the best part? No extra coding effort is necessary to enable Hyper-Q. All it takes is a Tesla K20 GPU with a CUDA 5 installation and setting an environment variable to let multiple MPI ranks share the GPU – Hyper-Q is then ready to use.
NVIDIA says the Tesla K20 will be available by the end of the year.