NVIDIA reveals some details about Hyper-Q, a new feature of the company's upcoming GK110-based Tesla K20 computing accelerator. This new technology helps increase performance for thousands of legacy MPI applications without requiring a major code rewrite, by providing 32 work queues between the host and the GPU, enabling multiple MPI processes to run concurrently on the GPU, thereby maximizing GPU utilization and boosting overall performance.
The company presents CP2K as an example, in this atomic and molecular simulation the use of Hyper-Q leads to a 2.5x speedup without extra coding effort.
This small data set of 864 water molecules is usually problematic for GPUs. Without Hyper-Q, only one MPI process runs on each node with GPUs, and the performance curve from 1 to 16 nodes is not much better than with CPU-only simulations.
With Hyper-Q, it is now possible to use the same number of MPI processes per node as in the CPU-only case, which means 16 MPI processes per GPU in this instance. This unlocks the full benefit of the GPU, leading to a speedup of 2.5x with Hyper-Q enabled.
And the best part? No extra coding effort is necessary to enable Hyper-Q. All it takes is a Tesla K20 GPU with a CUDA 5 installation and setting an environment variable to let multiple MPI ranks share the GPU – Hyper-Q is then ready to use.
NVIDIA says the Tesla K20 will be available by the end of the year.