CUDA 4.0发布

技术2022-05-20 53

CUDA Toolkit 4.0 RC (March 2011)

For older releases, see the CUDA Toolkit Release Archive

Release Highlights

Easier Application Porting

Share GPUs across multiple threads Use all GPUs in the system concurrently from a single host thread No-copy pinning of system memory, a faster alternative to cudaMallocHost() C++ new/delete and support for virtual functions Support for inline PTX assembly Thrust library of templated performance primitives such as sort, reduce, etc. NVIDIA Performance Primitives (NPP) library for image/video processing Layered Textures for working with same size/format textures at larger sizes and higher performance

Faster Multi-GPU Programming

Unified Virtual Addressing GPUDirect v2.0 support for Peer-to-Peer Communication

New & Improved Developer Tools

Automated Performance Analysis in Visual Profiler C++ debugging in cuda-gdb GPU binary disassembler for Fermi architecture (cuobjdump)

Please refer to the Release Notes and Getting Started Guides for more information.

从特性上看，不是简单的硬件版本更新，而是对所有的显卡都有用的。

尤其值得称赞的是多卡之间可以通过pcie直接进行数据交换，很多应用就不会再受PCIE带宽的限制了。

利用PCIE的采集卡的设备，也可以在不久的将来直接通过PCIE直接跟GPU进行数据交互，不用再通过主内存传递数据，这是一个伟大的进步！

统一寻址地址：

GPUDirect V2.0新特性，可以直接通过PCIE之间传数据，而不是通过主内存做中转：

C++模板的支持

Nvidia的硬件在今年没有太多变化，但是CUDA4.0的出现，必然会对已有的硬件产生新的活力。

对于大数据传输的应用，很多都可以很好的支持了！

专利

最新回复(0)