Today NVIDIA announced the availability of the CUDA Toolkit 3.2 production release, which provides significant performance increases, new math libraries and advanced cluster management features for developers creating next-generation GPU-accelerated applications.
The CUDA Toolkit includes all the tools, libraries and documentation developers need to build CUDA C/C++ applications, and is the foundation for many other GPU computing language solutions. New features and significant performance enhancements in version 3.2 include:
Up to 300-percent performance improvement in CUDA BLAS (CUBLAS) library routines, delivering 8 times faster performance than the latest Intel MKL (Math Kernel Library)
CUDA FFT (CUFFT) library optimizations delivering 2 - 20 times faster performance than the latest MKL
New CURAND library for random number generation at 10-20 times faster than the latest MKL
New CUSPARSE library of sparse matrix routines that delivers 6-30 times faster performance than the latest MKL
A host of additional improvements to GPU debugging and performance analysis tools
In addition, the new CUDA Toolkit 3.2 release includes H.264 encode/decode, new Tesla Compute Cluster (TCC) integration, cluster management features, and support for the new 6GB NVIDIA Tesla and Quadro GPU products.