Table des matières
Getting support
Trying to solve issues by yourself
The first thing to do when a crash occurs is to check your output files (specified in -o and -e PBS parameters). In most cases, crash details are written inside and are human-readable.
Example:
Your job is crashing each time you try to run it and, in your output files, you can read:
- output
[...] Traceback (most recent call last): File "<stdin>", line 1, in <module> File "/shared/compilers/python/2.7.5/gnu/lib/python2.7/site-packages/h5py-2.5.0-py2.7-linux-x86_64.egg/h5py/__init__.py", line 13, in <module> from . import _errors ImportError: libhdf5.so.9: cannot open shared object file: No such file or directory
As you can notice above, the library libhdf5.so.9 is missing. So, the good way to solve it yourself is to wonder where is located the expected library (manifestly not in a common environment); and the good response is to load the afferent module, simply doing:
$ module load hdf5
This is an easy-to-understand example but it represents the vast majority of user support requests.
Asking help to your PI
After having tried to solve your problem by yourself, you should try askin help to your PI who probably have a long-term user experience with the computing resources available here and/or elsewhere.
Asking help to the users community of LBT's computing resources
Because you are part of the users community of the LBT's computing resources, a mailing-list has been created to share your issues and experience by sending an e-mail to cluster-users@ibpc.fr. If you are not part of this list, you should contact IBPC IT team (at lbt-info@ibpc.fr) to register into.
This mailing-list will also be used to inform you about computing resources evolution.
Asking help to the IT manager
All user-support requests concerning LBT's computing resources should be done sending an email at lbt-info@ibpc.fr.
That said, if -and only if- your request concerns the LBT's computational and storage resources (not the IBPC network and services neither desktop machines), you can contact me directly sending a well-formed email at geoffrey.letessier@cnrs.fr.
What I mean by “well-formed email”: if possible, your email should include the following 4 pieces of information:
- Description of your issue.
- Is your issue reproducible? (i.e.: does it happen everytime in the same conditions?)
- The full path of your crashing job
- The 2 output files provided by Torque resource manager (files generated by the -e and -o PBS options in your job script), being sure you are using the script I provided here