NVIDIA announced Mellanox Technologies introduced new software to increase cluster application performance when communicating over Mellanox InfibiBand to servers equipped with NVIDIA Tesla GPUs, by reducing latency by as much as 30 percent.
The system architecture of a GPU-CPU server requires the CPU to initiate and manage memory transfers between the GPU and the InfiniBand network. The new software solution will enable Tesla GPUs to transfer data to pinned system memory that a Mellanox InfiniBand solution is able to read and transmit over the network. The result is increased overall system performance and efficiency.
"NVIDIA Tesla GPUs deliver large increases in performance across each node in a cluster, but in our production runs on TSUBAME 1 we have found that network communication becomes a bottleneck when using multiple GPUs," said Prof. Satoshi Matsuoka from Tokyo Institute of Technology. "Reducing the dependency on the CPU by using InfiniBand will deliver a major boost in performance in high performance GPU clusters, thanks to the work of NVIDIA and Mellanox, and will further enhance the architectural advances we will make in TSUBAME2.0."
"In GPU-based clusters, most of the compute intensive processing is running on the GPUs," said Gilad Shainer, director of high performance computing and technical marketing at Mellanox Technologies. "It's a natural evolution of the system architecture to enable GPUs to communicate more intelligently over InfiniBand. This helps create a computing platform that will enable future Exascale computing and dramatically increase performance for a broad spectrum of applications."
"Anyone who cares about performance in their datacenter uses InfiniBand," said Andy Keane, general manager, Tesla business at NVIDIA. "This new feature will further improve application performance on GPU-based clusters by reducing the dependency on the CPU for communicating over InfiniBand."
This software capability will be available in the NVIDIA CUDA(TM) architecture toolkit beginning in Q2 2010 and will work on existing Tesla S1070 1U computing systems and Tesla M1060 module-based clusters and also with the new Tesla 20-series S2050 and S2070 1U systems.