Below are the common issues you can solve by yourself:
If you note this kind of output when you try to connect remotely into the clusters:
$ ssh baal @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ @ WARNING: REMOTE HOST IDENTIFICATION HAS CHANGED! @ @@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@@ IT IS POSSIBLE THAT SOMEONE IS DOING SOMETHING NASTY! Someone could be eavesdropping on you right now (man-in-the-middle attack)! It is also possible that the RSA host key has just been changed. The fingerprint for the RSA key sent by the remote host is 8a:51:9a:2c:78:03:51:03:39:f3:03:1f:aa:2f:56:c7. Please contact your system administrator. Add correct host key in /home/ibpcadmin/.ssh/known_hosts to get rid of this message. Offending key in /home/ibpcadmin/.ssh/known_hosts:5 RSA host key for baal.lbt.ibpc.fr has changed and you have requested strict checking. Host key verification failed.
This is probably because the security key has changed. The easiest way to solve this is to remove your old key to let your SSH client recreate its needed entry into your .ssh/known_hosts file.
To remove it:
$ sed -i.bak '<line number>d' ~/.ssh/known_hosts
As an example, as you can see in the previous message, the offending key is located on line #5 in the known_hosts file. So, to remove it (after doing a backup):
$ sed -i'.bak' '5d' ~/.ssh/known_hosts
You can alternatively use the following command line to solve the problem:
$ ssh-keygen -f ~/.ssh/known_hosts -R "baal.lbt.ibpc.fr"
Sometime, when you submit a job, you may notice your job stay blocked in queue of which you are not aware why. In this case, the first thing to do is to check your job status:
$ checkjob -vv <job-ID>
$ checkjob -vv 18049 checking job 18049 (RM job '18049.torque1.cluster.lbt') State: Idle EState: Deferred Creds: user:admin group:admin_team account:baaden_project class:monop qos:DEFAULT WallTime: 00:00:00 of 1:00:00 SubmitTime: Mon Oct 12 17:20:09 (Time Queued Total: 00:00:00 Eligible: 00:00:00) StartDate: 00:00:01 Mon Oct 12 17:20:10 Total Tasks: 1 Req[0] TaskCount: 1 Partition: ALL Network: [NONE] Memory >= 0 Disk >= 0 Swap >= 0 Opsys: [NONE] Arch: [NONE] Features: [smp-nodes] Exec: '' ExecSize: 0 ImageSize: 0 Dedicated Resources Per Task: PROCS: 1 NodeAccess: SHARED NodeCount: 1 IWD: [NONE] Executable: [NONE] Bypass: 0 StartCount: 0 PartitionMask: [ALL] job is deferred. Reason: BankFailure (cannot debit job account) Holds: Defer (hold reason: BankFailure) PE: 1.00 StartPriority: 24 cannot select job 18049 for partition DEFAULT (job hold active)
In the message above, you can notice “BankFailure (cannot debit job account)”. This message means your credit is either completely burned, expired or not existing (typing error?) -or you are not a member of this credit account.