As you discover, it cudaMemsetworks like a standard C library memset. Quoting from the documentation:
cudaError_t cudaMemset ( void * devPtr,
int value,
size_t count
)
Fills the first bytes of the memory area count indicated by devPtr with a constant value for the byte value.
So, valueis a byte value. If you do something like:
int *devPtr;
cudaMalloc((void **)&devPtr,number_bytes);
const int value = 5;
cudaMemset(devPtr,value,number_bytes);
, , , devPtr 5. devPtr , , 84215045. , , , .
API- , โโ . ,
template<typename T>
__global__ void initKernel(T * devPtr, const T val, const size_t nwords)
{
int tidx = threadIdx.x + blockDim.x * blockIdx.x;
int stride = blockDim.x * gridDim.x;
for(; tidx < nwords; tidx += stride)
devPtr[tidx] = val;
}
( : , , , ).
, , , , cudaMemset. , cudaMemset , API , , .
, API- , cuMemsetD16 cuMemsetD32, , 32- . 64- (, ), .