I searched googling and was able to find a trivial example of the new dynamic parallelism in Compute Capability 3.0 in one of their technical summaries linked from here . I know that HPC-specific cards are likely to be unavailable until this time next year (after nat'l labs get theirs), and yes, I understand that the simple example they gave is enough for you, but the more fun .
Are there any other examples that I missed?
To save you the trouble, here is the whole example provided in the technical overview:
__global__ ChildKernel(void* data){
}
__global__ ParentKernel(void *data){
ChildKernel<<<16, 1>>>(data);
}
ParentKernel<<<256, 64>>(data);
__global__ RecursiveKernel(void* data){
if(continueRecursion == true)
RecursiveKernel<<<64, 16>>>(data);
}
EDIT:
GTC talk CUDA parallelism CUDA 5. . , , .