in my OpenCL code, I use clSetKernelArgto create a memory "variable size" __localfor use in my kernels, which is not in OpenCL per se. See my example:
clSetKernelArg(clKernel, ArgCounter++, sizeof(cl_mem), (void *)&d_B);
...
clSetKernelArg(clKernel, ArgCounter++, sizeof(float)*block_size*block_size, NULL);
...
kernel="
matrixMul(__global float* C,
...
__local float* A_temp,
...
)"
{...
Now my question is, how to do the same in pyopencl?
I looked at the examples that came with pyopencl, but the only thing I could find was the template approach, which seems to me like I understood it as brute force. See an example.
kernel = """
__kernel void matrixMul(__global float* C,...){
...
__local float A_temp[ %(mem_size) ];
...
}
What do you recommend?
source
share