I am developing algorithms in CUDA on my desktop that will later run on the server.
Is it possible to use a recent low-level card (for example, computational ability 2.1) to get all the good debugging and profiling functions, and then put the code on the server using a high end card (with the same cc)? I just need to adjust the thread / mesh sizes or change everything ™.
Example: I would develop a Quadro 600 , and the server would use the Tesla C2075 .
source
share