Error loading shared libraries: libcudart.so.4: cannot open shared objects file: no such file or directory

I am trying to execute MPI and CUDA code in a cluster. The code works fine on one machine, but when I try to execute it on a cluster, I get an error:

when loading shared libraries: libcudart.so.4: cannot open shared objects file: no such file or directory

I checked my PATH and LD_PATH and everything looks fine. I have a .bashrc file that contains the following entries -

export PATH = $ PATH: / usr / local / lib /: / usr / local / lib / openmpi: / usr / local / cuda / bin export LD_LIBRARY_PATH = $ LD_LIBRARY_PATH: / usr / local / lib: / usr / local / lib / openmpi /: / usr / local / cuda / lib

All machines have the same installation of CUDA and OpenMPI.

I also have / usr / local / cuda / lib in / etc / ld.so.conf

Can anyone help me with this. This problem is really annoying.

Thank.

+3
source share
1 answer

If you are sending a batch job in a cluster, add commands like

echo $LD_LIBRARY_PATH 
ldd ./your_app 

for your party script. This should help debug the problem.

Also make sure that you export environment variables to mpirun. For example, in OpenMPI you run your code with

mpirun -x LD_LIBRARY_PATH ...
+5
source

All Articles