Combined global memory is written using a hash

Question

Combined global memory is written using a hash

My question is about the combined global record of a dynamically changing set of array elements in CUDA. Consider the following kernel:

__global__ void
kernel (int n, int *odata, int *idata, int *hash)
{
  int i = blockIdx.x * blockDim.x + threadIdx.x;
  if (i < n)
    odata[hash[i]] = idata[i];
}

Here, the first elements of the narray hashcontain indexes odatafor updating from the first nelements idata. Obviously, this leads to a terrible, terrible lack of coalescence. In the case of my code, the hash on one kernel call is completely unrelated to the hash on the other (and other kernels update the data in other ways), so simply reordering the data to optimize this particular kenrel is not an option.

Is there any feature in CUDA that would allow me to improve the performance of this situation? I heard a lot of talk about texture memory, but I could not translate what I read into a solution to this problem.

+3

c ++ c gpgpu cuda

coastal Oct 17 '12 at 15:53

source share

2 answers

? , , 1K 0 8K odata.

, . . .

0

ahmad 18 . '12 16:28

Robert Crovella · Accepted Answer · 2012-10-17T16:16:23+0000

Texturing is a read-only mechanism, so it cannot directly improve the performance of scattered records in GMEM. If you used hash instead:

odata[i] = idata[hash[i]];

(maybe your algorithm could be converted?)

Then there may be some benefit from considering the texture mechanism . (Your example looks 1D in nature).

You can also make sure that the shared memory allocation / L 1 is cache optimized. This will not greatly help scattered letters.

Combined global memory is written using a hash

More articles: