gromacs.qsub – utilities for batch submission systems

The module helps writing submission scripts for various batch submission queuing systems. The known ones are listed stored as QueuingSystem instances in queuing_systems; append new ones to this list.

The working paradigm is that template scripts are provided (see gromacs.config.templates) and only a few place holders are substituted (using gromacs.cbook.edit_txt()).

User-supplied template scripts can be stored in gromacs.config.qscriptdir (by default ~/.gromacswrapper/qscripts) and they will be picked up before the package-supplied ones.

The Manager handles setup and control of jobs in a queuing system on a remote system via ssh.

At the moment, some of the functions in gromacs.setup use this module but it is fairly independent and could conceivably be used for a wider range of projects.

Queuing system templates

The queuing system scripts are highly specific and you will need to add your own. Templates should be shell scripts. Some parts of the templates are modified by the generate_submit_scripts() function. The “place holders” that can be replaced are shown in the table below. Typically, the place holders are either shell variable assignments or batch submission system commands. The table shows SGE commands but PBS and LoadLeveler have similar constructs; e.g. PBS commands start with #PBS and LoadLeveller uses #@ with its own command keywords).

Substitutions in queuing system templates.
place holder default replacement description regex
#$ -N GMX_MD sgename job name /^#.*(-N|job_name)/
#$ -l walltime= 00:20:00 walltime max run time /^#.*(-l walltime|wall_clock_limit)/
#$ -A BUDGET budget account /^#.*(-A|account_no)/
DEFFNM= md deffnm default gmx name /^DEFFNM=/
WALL_HOURS= 0.33 walltime h mdrun’s -maxh /^WALL_HOURS=/
MDRUN_OPTS= “” mdrun_opts more options /^MDRUN_OPTS=/

Lines with place holders should not have any white space at the beginning. The regular expression pattern (“regex”) is used to find the lines for the replacement and the literal default values (“default”) are replaced. Not all place holders have to occur in a template; for instance, if a queue has no run time limitation then one would probably not include walltime and WALL_HOURS place holders.

The line # JOB_ARRAY_PLACEHOLDER can be replaced by generate_submit_array() to produce a “job array” (also known as a “task array”) script that runs a large number of related simulations under the control of a single queuing system job. The individual array tasks are run from different sub directories. Only queuing system scripts that are using the bash shell are supported for job arrays at the moment.

A queuing system script must have the appropriate suffix to be properly recognized, as shown in the table below.

Suffices for queuing system templates. Pure shell-scripts are only used to run locally.
Queuing system suffix notes
Sun Gridengine .sge Sun’s Sun Gridengine
Portable Batch queuing system .pbs OpenPBS and PBS Pro
LoadLeveler .ll IBM’s LoadLeveler
bash script .bash, .sh Advanced bash scripting
csh script .csh avoid csh

Example queuing system script template for PBS

The following script is a usable PBS script for a super computer. It contains almost all of the replacement tokens listed in the table (indicated by ++++++; these values should be kept in the template as they are or they will not be subject to replacement).

#!/bin/bash
# File name: ~/.gromacswrapper/qscripts/supercomputer.somewhere.fr_64core.pbs
#PBS -N GMX_MD
#       ++++++
#PBS -j oe
#PBS -l select=8:ncpus=8:mpiprocs=8
#PBS -l walltime=00:20:00
#                ++++++++

# host: supercomputer.somewhere.fr
# queuing system: PBS

# set this to the same value as walltime; mdrun will stop cleanly
# at 0.99 * WALL_HOURS 
WALL_HOURS=0.33
#          ++++

# deffnm line is possibly modified by gromacs.setup
# (leave it as it is in the template)
DEFFNM=md
#      ++

TPR=${DEFFNM}.tpr
OUTPUT=${DEFFNM}.out
PDB=${DEFFNM}.pdb

MDRUN_OPTS=""
#          ++

# If you always want to add additional MDRUN options in this script then
# you can either do this directly in the mdrun commandline below or by
# constructs such as the following:
## MDRUN_OPTS="-npme 24 $MDRUN_OPTS"

# JOB_ARRAY_PLACEHOLDER
#++++++++++++++++++++++   leave the full commented line intact!

# avoids some failures
export MPI_GROUP_MAX=1024
# use hard coded path for time being
GMXBIN="/opt/software/SGI/gromacs/4.0.3/bin"
MPIRUN=/usr/pbs/bin/mpiexec
APPLICATION=$GMXBIN/mdrun_mpi

$MPIRUN $APPLICATION -stepout 1000 -deffnm ${DEFFNM} -s ${TPR} -c ${PDB} -cpi                         $MDRUN_OPTS                         -maxh ${WALL_HOURS} > $OUTPUT
rc=$?

# dependent jobs will only start if rc == 0
exit $rc

Save the above script in ~/.gromacswrapper/qscripts under the name supercomputer.somewhere.fr_64core.pbs. This will make the script immediately usable. For example, in order to set up a production MD run with gromacs.setup.MD() for this super computer one would use

gromacs.setup.MD(..., qscripts=['supercomputer.somewhere.fr_64core.pbs', 'local.sh'])

This will generate submission scripts based on supercomputer.somewhere.fr_64core.pbs and also the default local.sh that is provided with GromacsWrapper.

In order to modify MDRUN_OPTS one would use the additonal mdrun_opts argument, for instance:

gromacs.setup.MD(..., qscripts=['supercomputer.somewhere.fr_64core.pbs', 'local.sh'],
                 mdrun_opts="-v -npme 20 -dlb yes -nosum")

Currently there is no good way to specify the number of processors when creating run scripts. You will need to provided scripts with different numbers of cores hard coded or set them when submitting the scripts with command line options to qsub.

Classes and functions

class gromacs.qsub.QueuingSystem(name, suffix, qsub_prefix, array_variable=None, array_option=None)

Class that represents minimum information about a batch submission system.

Define a queuing system’s functionality

Arguments:
name

name of the queuing system, e.g. ‘Sun Gridengine’

suffix

suffix of input files, e.g. ‘sge’

qsub_prefix

prefix string that starts a qsub flag in a script, e.g. ‘#$’

Keywords:
array_variable

environment variable exported for array jobs, e.g. ‘SGE_TASK_ID’

array_option

qsub option format string to launch an array (e.g. ‘-t %d-%d’)

array(directories)

Return multiline string for simple array jobs over directories.

Warning

The string is in bash and hence the template must also be bash (and not csh or sh).

array_flag(directories)
Return string to embed the array launching option in the script.
flag(*args)
Return string for qsub flag args prefixed with appropriate inscript prefix.
has_arrays()
True if known how to do job arrays.
isMine(scriptname)
Primitive queuing system detection; only looks at suffix at the moment.
gromacs.qsub.generate_submit_scripts(templates, prefix=None, deffnm='md', jobname='MD', budget=None, mdrun_opts=None, walltime=1.0, jobarray_string=None, **kwargs)

Write scripts for queuing systems.

This sets up queuing system run scripts with a simple search and replace in templates. See gromacs.cbook.edit_txt() for details. Shell scripts are made executable.

Arguments:
templates

Template file or list of template files. The “files” can also be names or symbolic names for templates in the templates directory. See gromacs.config for details and rules for writing templates.

prefix

Prefix for the final run script filename; by default the filename will be the same as the template. [None]

dirname

Directory in which to place the submit scripts. [.]

deffnm

Default filename prefix for mdrun -deffnm [md]

jobname

Name of the job in the queuing system. [MD]

budget

Which budget to book the runtime on [None]

mdrun_opts

String of additional options for mdrun.

walltime

Maximum runtime of the job in hours. [1]

jobarray_string

Multi-line string that is spliced in for job array functionality (see gromacs.qsub.generate_submit_array(); do not use manually)

kwargs

all other kwargs are ignored

Returns:

list of generated run scripts

gromacs.qsub.generate_submit_array(templates, directories, **kwargs)

Generate a array job.

For each work_dir in directories, the array job will
  1. cd into work_dir
  2. run the job as detailed in the template

It will use all the queuing system directives found in the template. If more complicated set ups are required, then this function cannot be used.

Arguments:
templates

Basic template for a single job; the job array logic is spliced into the position of the line

# JOB_ARRAY_PLACEHOLDER

The appropriate commands for common queuing systems (Sun Gridengine, PBS) are hard coded here. The queuing system is detected from the suffix of the template.

directories

List of directories under dirname. One task is set up for each directory.

dirname

The array script will be placed in this directory. The directories must be located under dirname.

kwargs

See gromacs.setup.generate_submit_script() for details.

gromacs.qsub.detect_queuing_system(scriptfile)
Return the queuing system for which scriptfile was written.
gromacs.qsub.queuing_systems
Pre-defined queuing systems (SGE, PBS). Add your own here.

Queuing system Manager

The Manager class must be customized for each system such as a cluster or a super computer. It then allows submission and control of jobs remotely (using ssh).

class gromacs.qsub.Manager(dirname='.', **kwargs)

Base class to launch simulations remotely on computers with queuing systems.

Basically, ssh into machine and run job.

Derive a class from Manager and override the attributes

and implement a specialized Manager.qsub() method if needed.

ssh must be set up (via ~/.ssh/config) to allow access via a commandline such as

ssh <hostname> <command> ...

Typically you want something such as

host <hostname>
     hostname <hostname>.fqdn.org
     user     <remote_user>

in ~/.ssh/config and also set up public-key authentication in order to avoid typing your password all the time.

Set up the manager.

Arguments:
statedir

directory component under the remote scratch dir (should be different for different jobs) [basename(CWD)]

prefix

identifier for job names [MD]

_hostname
hostname of the super computer (required)
_scratchdir
scratch dir on hostname (required)
_qscript
name of the template submission script appropriate for the queuing system on Manager._hostname; can be a path to a local file or a template stored in gromacs.config.qscriptdir or a key for gromacs.config.templates (required)
_walltime
maximum run time of script in hours; the queuing system script Manager._qscript is supposed to stop mdrun after 99% of this time via the -maxh option. A value of None or inf indicates no limit.
job_done()
alias for get_status()
qstat()
alias for get_status()
cat(dirname, prefix='md', cleanup=True)

Concatenate parts of a run in dirname.

Always uses gromacs.cbook.cat() with resolve_multi = ‘guess’.

Note

The default is to immediately delete the original files (cleanup = True).

Keywords:
dirname

directory to work in

prefix

prefix (deffnm) of the files [md]

cleanup : boolean

if True, remove all used files [True]

get(dirname, checkfile=None, targetdir='.')

scp -r dirname from host into targetdir

Arguments:
  • dirname: dir to download
  • checkfile: raise OSError/ENOENT if targetdir/dirname/checkfile was not found
  • targetdir: put dirname into this directory
Returns:

return code from scp

get_dir(*args)
Directory on the remote machine.
get_status(dirname, logfilename='md*.log', silent=False)

Check status of remote job by looking into the logfile.

Report on the status of the job and extracts the performance in ns/d if available (which is saved in Manager.performance).

Arguments:
  • dirname
  • logfilename can be a shell glob pattern [md*.log]
  • silent = True/False; True suppresses log.info messages
Returns:

True is job is done, False if still running None if no log file found to look at

Note

Also returns False if the connection failed.

Warning

This is an important but somewhat fragile method. It needs to be improved to be more robust.

local_get(dirname, checkfile, cattrajectories=True, cleanup=False)

Find checkfile locally if possible.

If checkfile is not found in dirname then it is transferred from the remote host.

If needed, the trajectories are concatenated using Manager.cat().

Returns:local path of checkfile
log_RE
Regular expression used by Manager.get_status() to parse the logfile from mdrun.
ndependent(runtime, performance=None, walltime=None)

Calculate how many dependent (chained) jobs are required.

Uses performance in ns/d (gathered from get_status()) and job max walltime (in hours) from the class unless provided as keywords.

n = ceil(runtime/(performance*0.99*walltime)
Keywords:
runtime

length of run in ns

performance

ns/d with the given setup

walltime

maximum run length of the script (using 99% of it), in h

Returns:

n or 1 if walltime is unlimited

put(dirname)

scp dirname to host.

Arguments:dirname to be transferred
Returns:return code from scp
putfile(filename, dirname)

scp filename to host in dirname.

Arguments:filename and dirname to be transferred to
Returns:return code from scp
qsub(dirname, **kwargs)

Submit job remotely on host.

This is the most primitive implementation: it just runs the commands

cd remotedir && qsub qscript

on Manager._hostname. remotedir is dirname under Manager._scratchdir and qscript defaults to the queuing system script hat was produced from the template Manager._qscript.

remotepath(*args)
Directory on the remote machine.
remoteuri(*args)
URI of the directory on the remote machine.
setup_MD(jobnumber, struct='MD_POSRES/md.pdb', **kwargs)

Set up production and transfer to host.

Arguments:
  • jobnumber: 1,2 ...
  • struct is the starting structure (default from POSRES run but that is just a guess);
  • kwargs are passed to gromacs.setup.MD()
setup_posres(**kwargs)

Set up position restraints run and transfer to host.

kwargs are passed to gromacs.setup.MD_restrained()

waitfor(dirname, **kwargs)

Wait until the job associated with dirname is done.

Super-primitive, uses a simple while ... sleep for seconds delay

Arguments:
dirname

look for log files under the remote dir corresponding to dirname

seconds

delay in seconds during re-polling