I have successfully written the CUDA FFT code, which performs 2D convolution of the image, as well as some other calculations.
How do I figure out which largest FFT I can run? 2D R2C convolution plan seems to occupy 2x image size and the other 2x image size for C2R. This is like overhead!
Also, it seems like most benchmarks, etc. relate to relatively small FFTs. It seems that for large images I will quickly run out of memory. How is this usually handled? Can you convolve the FFT on the image tile and combine these results and expect it to be the same as if I performed 2D-FFT on the entire image?
Thank you for answering these questions.
Derek source
share