CUDA fft - cooley tukey, how does parallelism work?

I know how the FFT implementation ( Cooley-Tuckey algorithm ) works , and I know that the CUFFT CUFFT library for computing 1D or 2D FFT, but I would like to know how CUDA parallelism is used in the process.

Is this related to butterfly calculation? (is something like each thread loading part of the data into shared memory and then each thread calculating an even member or an odd member?)

+5
source share
1 answer

I don’t think that they use the Cooley-Taki algorithm because the index permutation phase makes it not very convenient for shared memory architectures. In addition, this algorithm works with power memory targets, which are also not suitable for memory collaboration. Most likely, they use some formulation of the Stockham FFT self-test: for example, the Bailey algorithm .

, , FFT , . , 512 1024- FFT ( , ) 128 . , radix-2 - . radix-8 radix-16, "" . , this "" .

+6

All Articles