Combined global memory is written using a hash

My question is about the combined global record of a dynamically changing set of array elements in CUDA. Consider the following kernel:

__global__ void
kernel (int n, int *odata, int *idata, int *hash)
{
  int i = blockIdx.x * blockDim.x + threadIdx.x;
  if (i < n)
    odata[hash[i]] = idata[i];
}

Here, the first elements of the narray hashcontain indexes odatafor updating from the first nelements idata. Obviously, this leads to a terrible, terrible lack of coalescence. In the case of my code, the hash on one kernel call is completely unrelated to the hash on the other (and other kernels update the data in other ways), so simply reordering the data to optimize this particular kenrel is not an option.

Is there any feature in CUDA that would allow me to improve the performance of this situation? I heard a lot of talk about texture memory, but I could not translate what I read into a solution to this problem.

+3
source share
2 answers

Texturing is a read-only mechanism, so it cannot directly improve the performance of scattered records in GMEM. If you used hash instead:

odata[i] = idata[hash[i]]; 

(maybe your algorithm could be converted?)

Then there may be some benefit from considering the texture mechanism . (Your example looks 1D in nature).

You can also make sure that the shared memory allocation / L 1 is cache optimized. This will not greatly help scattered letters.

+3
source

? , , 1K 0 8K odata.

, . . .

0

All Articles