Invalid global memory access caused by indirect access in CUDA

Question

Invalid global memory access caused by indirect access in CUDA

My CUDA program suffers from unrelated access to global memory. Although the idx-th stream deals only with the [idx] th cell in the array, there are many indirect memory accesses, as shown below.

int idx=blockDim.x*blockIdx.x+threadIdx.x;

.... = FF[m_front[m_fside[idx]]];

For m_fisde [idx] we have shared calls, but we really need FF [m_front [m_fside [idx]]]. There is two-tier indirect access.

I tried to find some data patterns in m_front or m_fsied to make this direct sequential access, but found that they were almost "random".

Is there any way to handle this?

+5

gpu gpgpu cuda

thierry Feb 28 '13 at 5:43

source share

1 answer

JackOLantern · Accepted Answer · 2013-03-01T22:16:43+0000

Speeding up random access to global memory: invalidating L1 cache line

. - L1, L2, GMEM - 128- . L2 L2, GMEM 32 . L1 L2. , –Xptxas –dlcm=cg nvcc.

: ECC

GPU (ECC), ECC . ECC , . , nvidia-smi Linux (. ) Microsoft Windows. , ECC , .

Kepler:

Kepler 48 , , , . , Shared/L1 . ( const __restrict) ( __ldg()) .

Invalid global memory access caused by indirect access in CUDA

More articles: