C/C++教程

c++ cuda拷贝内存

本文主要是介绍c++ cuda拷贝内存,对大家解决编程问题具有一定的参考价值,需要的程序猿们随着小编来一起学习吧!

https://developer.nvidia.com/zh-cn/blog/how-overlap-data-transfers-cuda-cc/

分批拷贝:

for (int i = 0; i < nStreams; ++i) {
  int offset = i * streamSize;
  cudaMemcpyAsync(&d_a[offset], &a[offset],
                  streamBytes, cudaMemcpyHostToDevice, cudaMemcpyHostToDevice, stream[i]);
}

for (int i = 0; i < nStreams; ++i) {
  int offset = i * streamSize;
  kernel<<<streamSize/blockSize, blockSize, 0, stream[i]>>>(d_a, offset);
}

for (int i = 0; i < nStreams; ++i) {
  int offset = i * streamSize;
  cudaMemcpyAsync(&a[offset], &d_a[offset],
                  streamBytes, cudaMemcpyDeviceToHost, cudaMemcpyDeviceToHost, stream[i]);
}

这篇关于c++ cuda拷贝内存的文章就介绍到这儿,希望我们推荐的文章对大家有所帮助,也希望大家多多支持为之网!