User Tools

Site Tools


cluster-lbt:getting_support

Getting support

Trying to solve issues by yourself

The first thing to do when a crash occurs is to check your output files (specified in -o and -e PBS parameters). In most cases, crash details are written inside and are human-readable.

Example:
Your job is crashing each time you try to run it and, in your output files, you can read:

output
[...]
Traceback (most recent call last):
 File "<stdin>", line 1, in <module>
 File "/shared/compilers/python/2.7.5/gnu/lib/python2.7/site-packages/h5py-2.5.0-py2.7-linux-x86_64.egg/h5py/__init__.py", line 13, in <module>
   from . import _errors
ImportError: libhdf5.so.9: cannot open shared object file: No such file or directory

As you can notice above, the library libhdf5.so.9 is missing. So, the good way to solve it yourself is to wonder where is located the expected library (manifestly not in a common environment); and the good response is to load the afferent module, simply doing:

$ module load hdf5

This is an easy-to-understand example but it represents the vast majority of user support requests.

Asking help to your PI

After having tried to solve your problem by yourself, you should try askin help to your PI who probably have a long-term user experience with the computing resources available here and/or elsewhere.

Asking help to the users community of LBT's computing resources

Because you are part of the users community of the LBT's computing resources, a mailing-list has been created to share your issues and experience by sending an e-mail to cluster-users@ibpc.fr. If you are not part of this list, you should contact the IT manager (at geoffrey.letessier@cnrs.fr) and/or IBPC IT team (at lbt-info@ibpc.fr) to register into.

This mailing-list will also be used to inform you about computing resources evolution.

Asking help to the IT manager

All user-support requests concerning LBT's computing resources should be done sending an email at lbt-info@ibpc.fr.

That said, if -and only if- your request concerns the LBT's computational and storage resources (not the IBPC network and services neither desktop machines), you can contact me directly sending a well-formed email at geoffrey.letessier@cnrs.fr.

If possible, your email should include the following 4 pieces of information:

  • Description of your issue.
  • Is your issue reproducible? (i.e.: does it happen everytime in the same conditions?)
  • The full path of your crashing job
  • The 2 output files provided by Torque resource manager (files generated by the -e and -o PBS options in your job script), being sure you are using the script I provided here

If your problem is that your job is blocked on queue, please dont kill it and provide me the output of “checkjob -vv <job-ID>” command

cluster-lbt/getting_support.txt · Last modified: 2018/06/28 19:05 by admin