Invalid global memory access caused by indirect access in CUDA

My CUDA program suffers from unrelated access to global memory. Although the idx-th stream deals only with the [idx] th cell in the array, there are many indirect memory accesses, as shown below.

int idx=blockDim.x*blockIdx.x+threadIdx.x;

.... = FF[m_front[m_fside[idx]]];

For m_fisde [idx] we have shared calls, but we really need FF [m_front [m_fside [idx]]]. There is two-tier indirect access.

I tried to find some data patterns in m_front or m_fsied to make this direct sequential access, but found that they were almost "random".

Is there any way to handle this?

+5
source share
1 answer

Speeding up random access to global memory: invalidating L1 cache line

. - L1, L2, GMEM - 128- . L2 L2, GMEM 32 . L1 L2. , –Xptxas –dlcm=cg nvcc.

: ECC

GPU (ECC), ECC . ECC , . , nvidia-smi Linux (. ) Microsoft Windows. , ECC , .

Kepler:

Kepler 48 , , , . , Shared/L1 . ( const __restrict) ( __ldg()) .

+3

All Articles