According to Nvidia, it has been designed with parallel programming in mind, so developers can get more of their stuff ported to GPUs. There’s GPUDirect 2.0 in the new release, which supports p2p communication between GPUs in a single workstation, which’ll give speed a kick up the jacksie.
Then there’s Unified Virtual Addressing which Nvidia tells us provides a single merged-memory space, that’s for the main system memory and GPU memories. UVA, Nvidia hopes, will make parallel programming easier for developers. Cuda 4.0 sports Thrust C++ template performance primitives libraries. In non Nvidia terms, this is a collection of open source C++ parallel algorithms, making it a tad easier for C++ developers. Thrust, says Nvidia, makes parallel sorting up to 100 times faster than with Standard Template Library and Threading Building Blocks.
C++ support will also include new/delete and virtual stuff.
Aside from the Thrusting in a parallel manner, Nvidia is excited about MPI integration with CUDA – which will automatically bung data across from the GPU’s memory over Infiniband if an application needs to send or receive MPI. Then there’s multi thread sharing of GPUs, so multiple CPUs can share contexts on one GPU.
Cuda 4.0 will also pack a new GPU binary disassembler and improve on its support for MacOS, we’re promised.
Nvidia thinks the release is the most important thing in the world, so be sure to enroll as a CUDA registered developer. The toolkit is free to pilfer on the 4th of March if you do.