For older releases, see the CUDA Toolkit Release Archive
Release Highlights
Easier Application Porting
Share GPUs across multiple threads Use all GPUs in the system concurrently from a single host thread No-copy pinning of system memory, a faster alternative to cudaMallocHost() C++ new/delete and support for virtual functions Support for inline PTX assembly Thrust library of templated performance primitives such as sort, reduce, etc. NVIDIA Performance Primitives (NPP) library for image/video processing Layered Textures for working with same size/format textures at larger sizes and higher performanceFaster Multi-GPU Programming
Unified Virtual Addressing GPUDirect v2.0 support for Peer-to-Peer CommunicationNew & Improved Developer Tools
Automated Performance Analysis in Visual Profiler C++ debugging in cuda-gdb GPU binary disassembler for Fermi architecture (cuobjdump)Please refer to the Release Notes and Getting Started Guides for more information.
从特性上看,不是简单的硬件版本更新,而是对所有的显卡都有用的。
尤其值得称赞的是多卡之间可以通过pcie直接进行数据交换,很多应用就不会再受PCIE带宽的限制了。
利用PCIE的采集卡的设备,也可以在不久的将来直接通过PCIE直接跟GPU进行数据交互,不用再通过主内存传递数据,这是一个伟大的进步!
统一寻址地址:
GPUDirect V2.0新特性,可以直接通过PCIE之间传数据,而不是通过主内存做中转:
C++模板的支持
Nvidia的硬件在今年没有太多变化,但是CUDA4.0的出现,必然会对已有的硬件产生新的活力。
对于大数据传输的应用,很多都可以很好的支持了!