I have questions about creating a fixed memory.
Now I use CUDA to work with large data sizes.
To reduce the execution time, I will find out that it is necessary to make a copy of the memory and start the kernel.
After searching for some texts and web pages to overlap a copy of the memory and start the kernel, I notice that it is necessary to allocate the host memory using cudaMallocHost, which will allocate the host memory for the committed memory.
In the case of using an integer or massive type on the host, it was easy to make fixed memory.
Just...
cudaStream_t* streams = (cudaStream_t*)malloc(MAX_num_stream * sizeof(cudaStream_t));
for(i=0; i<MAX_num_stream; i++)
cudaStreamCreate(&(streams[i]));
cudaMallocHost(&departure, its_size);
for(n=1; ... ; n++){
cudaMemcpyAsync( ... streams[n]);
kernel <<< ... , ... , ... , streams[n] >>> (...);
}
But in my case, the memory of my host is set like a werther.
And I can't find a way to turn vector-memory-host-memory into pinned memory using cudaMallocHost.
.
. .