This should work by providing feval call capture capture. Consider a trivial core like this:
__global__ void setOneEl( double * array, double val, int element ) {
array[element] = val;
}
Then, executing the following code in MATLAB works the way I assume you are after:
>> k = parallel.gpu.CUDAKernel('kern.ptx');
>> g = parallel.gpu.GPUArray.zeros(1,10);
>> for ii = 1:2:10, g = k.feval(g, rand, ii); end
>> gather(g)
ans =
0 0.0975 0 0.2785 0 0.5469 0 0.9575 0 0.9649
MATLAB, gpuArray , , gpuArray, , MATLAB. , CUDAKernel.feval , , .