vasp jobs not appearing to execute

Post Reply
smcgrat
Posts: 14
Joined: Mon Jul 06, 2020 4:51 pm
Full Name: Sean McGrath
Organization: Trinity College Dublin

vasp jobs not appearing to execute

Post by smcgrat »

Hello again, sorry to be a repeated nuisance.

Background & Summary
VASP has been added as an interface and a small test job submitted. VASP appears to hang when being executed by the assigned node in the HPC cluster webmo is running on though.

The necessary POTCAR file is empty. (That is an issue we will later have to resolve).

Behavior

This is the contents of the run_log:

Code: Select all

[webmo@pople-n005 18]$ cat run_log
Executing script: ./run_vasp.cgi
Creating working directory: /tmp/webmo-16380/18
Script execution node: pople-n005.cluster
Job execution node(s): pople-n005.cluster
Executing command: /home/support/apps/intel/15.0.6/impi/5.0.3.049/intel64/bin/mpirun -np 1 -machinefile /tmp/hvfk1MvL6t /home/users/webmo/vasp/vasp.5.4.4_pople01_parallel_complex_extended
Here are the contents of the job directory

Code: Select all

-rw-r--r--  1 webmo webmo   51 Jul 23 17:32 zmatrix
-rw-r--r--  1 webmo webmo  201 Jul 23 17:32 input.xyz
-rw-r--r--  1 webmo webmo   15 Jul 23 17:32 charges
drwxr-xr-x 19 webmo webmo  168 Jul 23 17:32 ..
-rw-r--r--  1 webmo webmo  234 Jul 23 17:32 job_options
-rw-r--r--  1 webmo webmo  184 Jul 23 17:32 summary
-rw-r--r--  1 webmo webmo    0 Jul 23 17:32 notes
-rw-r--r--  1 webmo webmo  221 Jul 23 17:32 input.poscar
-rw-r--r--  1 webmo webmo   47 Jul 23 17:32 input.kpoints
-rw-r--r--  1 webmo webmo   59 Jul 23 17:32 input.inp
lrwxrwxrwx  1 webmo webmo   53 Jul 23 17:32 POSCAR -> /usr/local/webmo/private/webmo/graeme/18/input.poscar
lrwxrwxrwx  1 webmo webmo   54 Jul 23 17:32 KPOINTS -> /usr/local/webmo/private/webmo/graeme/18/input.kpoints
lrwxrwxrwx  1 webmo webmo   50 Jul 23 17:32 INCAR -> /usr/local/webmo/private/webmo/graeme/18/input.inp
-rw-r--r--  1 webmo webmo    0 Jul 23 17:32 POTCAR
-rw-r--r--  1 webmo webmo 1.9K Jul 23 17:32 pbs_script.sh
-rw-r--r--  1 webmo webmo    0 Jul 23 17:32 pbs_stdout
-rw-r--r--  1 webmo webmo  353 Jul 23 17:32 run_log
-rw-r--r--  1 webmo webmo    0 Jul 23 17:32 output.out.stdout
lrwxrwxrwx  1 webmo webmo   58 Jul 23 17:32 output.out -> /usr/local/webmo/private/webmo/graeme/18/output.out.stdout
-rw-r--r--  1 webmo webmo   96 Jul 23 17:33 output.out.stderr
-rw-r--r--  1 webmo webmo  255 Jul 24 15:07 pbs_stderr
-rw-rw-r--  1 webmo webmo    0 Jul 24 15:08 CHGCAR
-rw-rw-r--  1 webmo webmo    0 Jul 24 15:08 WAVECAR
-rw-rw-r--  1 webmo webmo    0 Jul 24 15:08 EIGENVAL
-rw-rw-r--  1 webmo webmo    0 Jul 24 15:08 CONTCAR
-rw-rw-r--  1 webmo webmo    0 Jul 24 15:08 DOSCAR
-rw-rw-r--  1 webmo webmo    0 Jul 24 15:08 OSZICAR
-rw-rw-r--  1 webmo webmo    0 Jul 24 15:08 PCDAT
-rw-rw-r--  1 webmo webmo    0 Jul 24 15:08 XDATCAR
-rw-rw-r--  1 webmo webmo    0 Jul 24 15:08 REPORT
-rw-rw-r--  1 webmo webmo    0 Jul 24 15:08 CHG
drwxr-xr-x  2 webmo webmo 4.0K Jul 24 15:08 .
-rw-rw-r--  1 webmo webmo  401 Jul 24 15:14 OUTCAR
-rw-rw-r--  1 webmo webmo  746 Jul 24 15:14 vasprun.xml
There is no output generated though and the processes just sit idle not doing anything.

These are the relevant processes running on the node in the cluster:

Code: Select all

$ ps auxww | grep webmo
webmo     5929  0.0  0.0  23928  1596 ?        S    Jul23   0:00 /bin/sh /var/spool/slurmd/job21586/slurm_script
webmo     6094  0.0  0.0  34300  5188 ?        S    Jul23   0:00 /usr/bin/perl ./run_vasp.cgi 18 graeme compute
webmo     6097  0.0  0.0  23936  1556 ?        S    Jul23   0:00 /bin/sh /home/support/apps/intel/15.0.6/impi/5.0.3.049/intel64/bin/mpirun -np 1 -machinefile /tmp/hvfk1MvL6t /home/users/webmo/vasp/vasp.5.4.4_pople01_parallel_complex_extended
webmo     6103  0.0  0.0  20300  1544 ?        S    Jul23   0:00 mpiexec.hydra -np 1 -machinefile /tmp/hvfk1MvL6t /home/users/webmo/vasp/vasp.5.4.4_pople01_parallel_complex_extended
webmo     6104  0.0  0.0      0     0 ?        Z    Jul23   0:00 [srun] <defunct>
root     11664  0.0  0.0 103320   868 pts/1    S+   15:07   0:00 grep --color=auto webmo
Even after 18 hours or so there has been no failure or output by what should be a short job.

Expected behavior

The job should actually fail, not just hang. The POTCAR file is empty so that should cause it to break. But the webmo launched job doesn't get that far, it just hangs there indefinitely.

Running some of the same commands by hand leads to these errors:

Code: Select all

[webmo@pople-n005 18]$ bash /home/support/apps/intel/15.0.6/impi/5.0.3.049/intel64/bin/mpirun -np 1 -machinefile /tmp/hvfk1MvL6t /home/users/webmo/vasp/vasp.5.4.4_pople01_parallel_complex_extended
 running on    1 total cores
 distrk:  each k-point on    1 cores,    1 groups
 distr:  one band on    1 cores,    1 groups
 using from now: INCAR     
 vasp.5.4.4.18Apr17-6-g9f103f2a35 (build Nov 07 2019 11:29:47) complex          
  
 POSCAR found :  1 types and       2 ions
 scaLAPACK will be used
 ERROR: number of potentials on File POTCAR incompatible with number of species
 INCAR :           1 POTCAR:            0
[webmo@pople-n005 18]$ /home/support/apps/intel/15.0.6/impi/5.0.3.049/intel64/bin/mpirun -np 1 -machinefile /tmp/hvfk1MvL6t /home/users/webmo/vasp/vasp.5.4.4_pople01_pa
rallel_complex_extended
 running on    1 total cores
 distrk:  each k-point on    1 cores,    1 groups
 distr:  one band on    1 cores,    1 groups
 using from now: INCAR     
 vasp.5.4.4.18Apr17-6-g9f103f2a35 (build Nov 07 2019 11:29:47) complex          
  
 POSCAR found :  1 types and       2 ions
 scaLAPACK will be used
 ERROR: number of potentials on File POTCAR incompatible with number of species
 INCAR :           1 POTCAR:            0
Apologies if I have missed anything simple and if anymore information is needed please let me know.

Any pointers would be very much appreciated.

Many thanks in advance.

Sean

schmidt
Posts: 83
Joined: Sat May 30, 2020 3:00 pm
Full Name: JR Schmidt
Organization: WebMO, LLC

Re: vasp jobs not appearing to execute

Post by schmidt »

This should be debuggable from the command line. The script that WebMO submits to SLURM is pbs_scripts.sh stored in the job directory. The exact args its uses are stored as a comment. Try submitting that script from the command line and see what happens to the job. Does it hang?

The POTCAR issue is likely (?) that the the VASP psuedopotential directory is not set correct. The directory specified (in the interface manager) should have SUBDIRECTORIES like 'potpaw_GGA', 'potpaw_GGE', etc.

smcgrat
Posts: 14
Joined: Mon Jul 06, 2020 4:51 pm
Full Name: Sean McGrath
Organization: Trinity College Dublin

Re: vasp jobs not appearing to execute

Post by smcgrat »

Thanks schmidt.

When I try to manually submit the job with the following

Code: Select all

/bin/sbatch --reservation=webmo -J WebMO_24 -o /usr/local/webmo/private/webmo/graeme/24/pbs_stdout -e /usr/local/webmo/private/webmo/graeme/24/pbs_stderr -p 'compute' --nodes=1 --tasks-per-node=1 pbs_script.sh
I still get the same behavior. Noting appears to happen. Some processes launch but then do nothing and there is no output.

One issue seemed to be that the POTCAR file was empty. But when I copied in the relevant POTCAR file and tried to run vasp on it with the above sbatch ... command the same thing happened, i.e. the processes just hung there.

If I manually run the command from an interactive allocation on the cluster with the command:

Code: Select all

/home/support/apps/intel/15.0.6/impi/5.0.3.049/intel64/bin/mpirun -np 1 -machinefile /tmp/WUFoLevehK /home/users/webmo/vasp/vasp.5.4.4_pople01_parallel_complex_extended
Which I took from the run_log file created by WebMO, it does generate output as expected.

That was after manually copying in the POTCAR file as follows:

Code: Select all

cp /home/users/webmo/vasp/vasp_5.4/PBE/Mg/POTCAR /usr/local/webmo/private/webmo/graeme/24/POTCAR
This is the structure of the VASP pseudopotential directory:

Code: Select all

ls /home/users/webmo/vasp/vasp_5.4/
LDA  PBE  potpaw_PBE.52
Does that look like what WebMO expects? Each sub folder then has a series of sub folders that appear to correspond to the elements being modelled.

Can any suggestions be made as to why the job appears to hang in this fashion please? If anymore information is needed to help please let me know.

Best

Sean

smcgrat
Posts: 14
Joined: Mon Jul 06, 2020 4:51 pm
Full Name: Sean McGrath
Organization: Trinity College Dublin

Re: vasp jobs not appearing to execute

Post by smcgrat »

Hello,

We still haven't been able to get this working. Any assistance would be very much appreciated please.

Regards

Sean

smcgrat
Posts: 14
Joined: Mon Jul 06, 2020 4:51 pm
Full Name: Sean McGrath
Organization: Trinity College Dublin

Re: vasp jobs not appearing to execute

Post by smcgrat »

OK, so I took another poke at this.

It turns out that there were issues in the errors log about not being able to find the POTCAR.Z file:

Code: Select all

$ tail -1 /usr/local/webmo/private/webmo/errors
gzip: /home/users/webmo/vasp/vasp_5.4/potpaw_PBE/Mg/POTCAR.Z: No such file or directory

So I created symlink to the directory as expected by WebMo

Code: Select all

$ ls -altrh /home/users/webmo/vasp/vasp_5.4/potpaw_PBE
lrwxrwxrwx 1 webmo webmo 46 Aug 12 15:39 /home/users/webmo/vasp/vasp_5.4/potpaw_PBE -> /home/users/webmo/vasp/vasp_5.4/potpaw_PBE.52/
Then created a POTCAR.Z file with gzip and copied it to the expected location as per the errors log.

That stopped that particular error from appearing in the errors log now.

The job is still hanging though when submitted through the WebMO web interface or as a slurm job as follows:

Code: Select all

/bin/sbatch --reservation=webmo -J WebMO_35 -o /usr/local/webmo/private/webmo/graeme/35/pbs_stdout -e /usr/local/webmo/private/webmo/graeme/35/pbs_stderr -p 'compute' --nodes=1 --tasks-per-node=1 /usr/local/webmo/private/webmo/graeme/35/pbs_script.sh
Any advice on why the vasp submission isn't working would be much appreciated.

Sean

smcgrat
Posts: 14
Joined: Mon Jul 06, 2020 4:51 pm
Full Name: Sean McGrath
Organization: Trinity College Dublin

Re: vasp jobs not appearing to execute

Post by smcgrat »

Brief update, we managed to get a VASP job to run by turning off MPI support and ensuring there was a compressed POTCAR.Z file for each corresponding POTCAR file.

Post Reply