Locked page locked memory is faster to transfer to the GPU than unblocked memory. CUDA provides challenges cudaHostAllocand cudaHostRegisterto highlight or registration memory with page records. The Nvidia driver then checks for memory transfers if the host memory is locked and exits along the path to copy the code.
cudaHostAlloc
cudaHostRegister
Is it possible to block page locking during a system call mlock(), achieving exactly the same effect (relative to transfer speeds) as cudaHostRegister? Or does the corresponding CUDA request update the internal database that the driver requests?
mlock()
, NVIDIA , cudaHostAlloc .. mlock , , , RLIMIT_MEMLOCK, . , NVIDIA . , , .
mlock
RLIMIT_MEMLOCK
, cudaHostRegister mlock() , , , . , cudaMemcpy .
cudaMemcpy
. cuMemHostRegister() , (, , ), . , , .