Better to use one large core or cuda threads?

what's better? I need to process the data in several stages, and it seems to me that I have 2 options: 1) use one large core 2) use threads with one core for each step

There is some latency when starting the kernel, but does it really matter in this case? Is latency for a large core the same as the sum of the delays for several smaller cores?

Are there any advantages in one direction compared to another?

Thanks guys.

+3
source share
1 answer

The delay in starting the kernel on the Fermi map is about 10us, so there is nothing to worry about. It makes sense to make a scene in the game, you need to run many different shaders (which are the kernels).

, , . ​​ /. , , /.

, A, B C, READ-A-B-C-WRITE, READ-A-WRITE-READ-B-WRITE - READ - C - WRITE.

, , , .

+3

All Articles