Fast implementation of two-dimensional convolution?

I made a CUDA program for two-dimensional convolution and now I want to compare it with some non-CUDA implementation for measuring acceleration.

I could compare with my own implementation in simple C using the classic multi-loop approach or matlab conv2, but it does not feel like a legal / fair comparison, since they are not the fastest implementations there.

I also thought about trying OpenCV, and I was looking for an optimized version of SIMD with no luck. Any tips should I work with OpenCV?

NOTE. I read other questions, including this one , but the answer is basically the same as my simple C code or discussion of the various methods available.

+3
source share
1 answer

The fastest general two-dimensional convolution algorithm should first perform FFT on the source, then adjust, then FFT back to get the result (which is conv2 in matlab), so your multi-loop approach is probably not the best.

GSL will give you a standard and fast implementation of FFT if you want to use it.

In addition, if the kernel is separable , you can perform the convolution of the two-dimensional convolution.

OpenCV works great, if it works too, it should be widely accepted as a quick implementation.

+5
source

All Articles