Slurm Errors

Post Reply
yeahmag
Posts: 3
Joined: Thu Sep 12, 2024 4:39 am
Full Name: Aaron P McKinnon
Organization: California Institute of Technology

Slurm Errors

Post by yeahmag »

We have successfully deployed the basics of WebMO with Shibboleth on a submission host on our HPCC. We have the enterprise license, which allows us to use Slurm for submission to jobs to the cluster. Users can login in via Shibboleth and do basic, non-batch submission work, but we are missing something... We get the following error when submitting jobs:

sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified

From the WebMO error log at the same time:

[Wed Sep 11 11:56:07 2024] execute_input.cgi: sbatch: error: Batch job submission failed: Invalid account or account/partition combination specified at daemon_pbs.cgi line 477.
[Wed Sep 11 11:56:07 2024] execute_input.cgi: Undefined subroutine &main::generate_job_manager_page called at /var/www/cgi-bin/webmo/execute_input.cgi line 48.

The batch queues have been created with the partitions names we use in Slurm and assigned to the interfaces. Example from the top of a pbs_script.sh file generated by WebMO:

#!/bin/sh
# Submitted using: /central/slurm/install/current/bin/sbatch -J WebMO_4 -o /srv/webmo/mckinnon/4/pbs_stdout -e /srv/webmo/mckinnon/4/pbs_stderr -p 'expansion' --nodes=1 --tasks-per-node=1 --time=01:00:00

We aren't sure what we are missing...
schmidt
Posts: 96
Joined: Sat May 30, 2020 3:00 pm
Full Name: JR Schmidt
Organization: WebMO, LLC

Re: Slurm Errors

Post by schmidt »

"Users can login in via Shibboleth and do basic, non-batch submission work": Since you are logging in via Shibboleth, this suggests that these users do not have corresponding Linux accounts on the clusters. In that case, WebMO will run jobs under the UID under which the WebMO scripts themselves are executing. This will vary depending on configuration. If you have installed WebMO under (/home/webmo/public_html), they will typically be running under the UID of 'webmo'. If you have installed WebMO under /var/www, the will be running under user 'httpd'.

The latter case poses a problem if you try and integrate with SLURM, because 'httpd' is not a valid login account and SLURM will not run jobs from user 'httpd'.

The best solution is to properly install WebMO under /home/webmo, or otherwise force the scripts to run under UID 'webmo'. This should then allow SLURM jobs to be submitted.
yeahmag
Posts: 3
Joined: Thu Sep 12, 2024 4:39 am
Full Name: Aaron P McKinnon
Organization: California Institute of Technology

Re: Slurm Errors

Post by yeahmag »

So, the users are on the cluster itself. In the example that was my user, which I can guarantee is on the system. We are wondering if it's sudo style issue where the WebMO process needs to su to the users account to run the job. This doesn't seem to be able to be enabled with Shibboleth. So, can you enable sudo with Shibboleth in WebMO?

The other thing we have found is that the srun command for enumerating hosts in the daemon_pbs.cgi file is hardcoded without the required "-t" flag for our slurm configuration.

Any advice is welcome!

Thanks

-Aaron
schmidt
Posts: 96
Joined: Sat May 30, 2020 3:00 pm
Full Name: JR Schmidt
Organization: WebMO, LLC

Re: Slurm Errors

Post by schmidt »

You are correct. The 'sudo' ooption is not enabled for Shibboleth, because typically Shibboleth users would not map onto local cluster users. (Often for Shibboleth SSO, *all* campus users would have access, which would be unusual for a cluster.)

Although not officially supported (tested), and it is true that the Shibboleth usernames map exactly to cluster usernames, you may have success my manually editing the WebMO configuration file (cgi-bin/interfaces/globals.int), changing the "sudoEnabled" value to "1" (true).
yeahmag
Posts: 3
Joined: Thu Sep 12, 2024 4:39 am
Full Name: Aaron P McKinnon
Organization: California Institute of Technology

Re: Slurm Errors

Post by yeahmag »

No change in action there:

# grep sudo globals.int
sudoEnabled="1"
sudoPath="/usr/bin/sudo"

I reloaded apache and it still doesn't show in the systemmgr_admin.cgi as enabled. Is it expected to not "check" the box? I also checked the journal and it doesn't appear to attempt to use sudo.

Unless you disagree, my next step would be to try PAM.

Thanks!
Post Reply