Does a modern GPU (e.g. Fermi / Evergreen) support order execution?

I am writing an OpenCL kernel that includes several barriers in a loop. I tested the kernel on a processor (8-core FX8150), and the result shows that these barriers reduced the speed by 50-100 times (I also confirmed this by re-implementing the kernel in Java using multi-threading + CyclicBarrier), I suspect that the reason was the barrier, essentially stopping the processor, taking advantage of the queue, so I'm a little worried if I had observed the same amount of speed reduction on the GPU. I checked a few official docs and worked a bit at Google, but there is not much information on this topic.

+5
source share
2 answers

Modern GPUs are a processor with pipelining in order. Graphic elements efficiently fill the pipeline with the help of alternating instructions from different skews (wave fronts). In comparison, CPUs use a non-standard design to fill the conveyor. There are various functional units, such as ALU and SFU, which separate pipelines. But note that team dependency stops deformation. For more information on resolving command dependencies on GPUs, see

+7
source

The next generation of NVIDIA
CUDA Compute and Graphics Architecture, code-named "Fermi":

Nvidia GigaThread Engine ( 5)

  • 10-
  • :)

Evergreen SIMD , . " " HD 7000 GTX 600 ( 10 )

+2

All Articles