NVIDIA has published the CUDA 4.0 Toolkit, this new developers kit offers new features to make parallel programming easier and enhanced C++ template libraries. Software developers can find more info on NVIDIA's CUDA page.
NVIDIA today announced the latest version of the NVIDIA(R) CUDA(R) Toolkit for developing parallel applications using NVIDIA GPUs.
The NVIDIA CUDA 4.0 Toolkit was designed to make parallel programming easier, and enable more developers to port their applications to GPUs. This has resulted in three main features:
- NVIDIA GPUDirect(TM) 2.0 Technology -- Offers support for
peer-to-peer communication among GPUs within a single server or
workstation. This enables easier and faster multi-GPU programming and
- Unified Virtual Addressing (UVA) -- Provides a single merged-memory
address space for the main system memory and the GPU memories,
enabling quicker and easier parallel programming.
- Thrust C++ Template Performance Primitives Libraries -- Provides a
collection of powerful open source C++ parallel algorithms and data
structures that ease programming for C++ developers. With Thrust,
routines such as parallel sorting are 5X to 100X faster than with
Standard Template Library (STL) and Threading Building Blocks (TBB).
"Unified virtual addressing and faster GPU-to-GPU communication makes it easier for developers to take advantage of the parallel computing capability of GPUs," said John Stone, senior research programmer, University of Illinois, Urbana-Champaign.
"Having access to GPU computing through the standard template interface greatly increases productivity for a wide range of tasks, from simple cashflow generation to complex computations with Libor market models, variable annuities or CVA adjustments," said Peter Decrem, director of Rates Products at Quantifi. "The Thrust C++ library has lowered the barrier of entry significantly by taking care of low-level functionality like memory access and allocation, allowing the financial engineer to focus on algorithm development in a GPU-enhanced environment."
The CUDA 4.0 architecture release includes a number of other key features and capabilities, including:
- MPI Integration with CUDA Applications -- Modified MPI implementations
automatically move data from and to the GPU memory over Infiniband
when an application does an MPI send or receive call.
- Multi-thread Sharing of GPUs -- Multiple CPU host threads can share
contexts on a single GPU, making it easier to share a single GPU by
- Multi-GPU Sharing by Single CPU Thread -- A single CPU host thread can
access all GPUs in a system. Developers can easily coordinate work
across multiple GPUs for tasks such as "halo" exchange in
- New NPP Image and Computer Vision Library -- A rich set of image
transformation operations that enable rapid development of imaging and
computer vision applications.
- New and Improved Capabilities
- Auto performance analysis in the Visual Profiler
- New features in cuda-gdb and added support for MacOS
- Added support for C++ features like new/delete and virtual
- New GPU binary disassembler
Arelease candidate of CUDA Toolkit 4.0 will be available free of charge beginning March 4, 2011, by enrolling in the CUDA Registered Developer Program at: www.nvidia.com/paralleldeveloper. The CUDA Registered Developer Program provides a wealth of tools, resources, and information for parallel application developers to maximize the potential of CUDA.