Why can't I combine data transfer and computing with the GTX 480 and CUDA 5?

Question

Why can't I combine data transfer and computing with the GTX 480 and CUDA 5?

I tried to combine kernel execution with memcpyasync, but it does not work. I follow all the recommendations in the programming guide using fixed memory, different threads, etc. I see that kernel execution overlaps, but it is not related to mem transfers. I know that my card has only one copy mechanism and one execution mechanism, but execution and transitions should overlap, right?

It seems that the “copy mechanism” and the “execution mechanism” always apply the order that I call functions. The work consists of 4 threads executing [HtoD x2, Kernel, DtoH]. If I release HtoDx2, Kernel, DtoH serie in each stream, I see in the profiler how the first operation of stream2 HtoD will not start until the first DtoH operation is completed. If I first produce an HtoD in each thread, then the second HtoD, then the kernel, and then DtoH (width), I do not see any matches, and the output order is also provided by the GPU.

I tried with the simpleStreams example specified in the CUDA SDK and also see the same behavior.

I am attaching some screenshots showing the problem in both the visual profiler and Nsight for VS2008.

ps. I did not set CUDA_LAUNCH_BLOCKING env

Transparent Simple Streams Proxy Simple Streams Visual Profiler

MyApp Nsight MyApp Nsight timeline breadth first

- MyApp Nsight Myapp nsight timeline depth first

x4 ( 2HtoD, 5 , 1DtoH ) → nvprof --concurrent-kernels-off, . env CUDA_LAUNCH_BLOCKING = 1, ( ) 7,5%!

:

Windows 7
NVIDIA 6800 VGA PCI-E
GTX480 PCI-E
NVIDIA: 306.94
Visual studio 2008
CUDA v5.0
Visual Profiler 5.0
Nsight 3.0

+5

concurrency cuda overlapping nsight

Dredok 22 . '13 10:20

3

. , . , CUDA Windows, Windows. , ( ) Linux.

, "simpleStreams" SDK. , "simpleStreams", Windows, , Linux .

CUDA 5.0 Fermi GTX570. 8800GT GTX Titan , CUDA Windows. , .

0

Kai Xiao 28 '13 2:24

TL; DR: TDR WDDM Nsight Monitor! false, . , TDR , "" , . , , ( ), !

( ) , , , .

! , . , ! GTX 650 GT 640.

, - , gpu ( ), gpu ( ), nvidia ! gpu, , . , !

, concurrency , ( BIOS), , ..
>
"".
" " " ".

. , , , !

:

,
, aero.

, , . , .

, , !

< > ( dgemm, ), , "simpleStreams" concurrency...

: Windows! , !

, , .

-1

Aperture Laboratories 18 . '15 8:31

Dredok · Accepted Answer · 2013-05-26T11:49:28+0000

, CUDA, . 1.1 capabilites (8800 GTS) 3.5 (GTX Titan), . , Fermi ( GTX 480 ).

Why can't I combine data transfer and computing with the GTX 480 and CUDA 5?

More articles: