Package gromacs :: Module cbook
[hide private]
[frames] | no frames]

Module cbook

source code


:mod:`gromacs.cbook` -- Gromacs Cook Book
=========================================

The :mod:`~gromacs.cbook` (cook book) module contains short recipes for tasks
that are often repeated. In the simplest case this is just one of the
gromacs tools with a certain set of default command line options.

By abstracting and collecting these invocations here, errors can be
reduced and the code snippets can also serve as canonical examples for
how to do simple things.

Miscellaneous canned Gromacs commands
-------------------------------------

Simple commands with new default options so that they solve a specific
problem (see also `Manipulating trajectories and structures`_):

.. function:: rmsd_backbone([s="md.tpr", f="md.xtc"[, ...]])

   Computes the RMSD of the "Backbone" atoms after fitting to the
   "Backbone" (including both translation and rotation).


Manipulating trajectories and structures
----------------------------------------

Standard invocations for manipulating trajectories. 

.. function:: trj_compact([s="md.tpr", f="md.xtc", o="compact.xtc"[, ...]])

   Writes an output trajectory or frame with a compact representation
   of the system centered on the protein. It centers on the group
   "Protein" and outputs the whole "System" group.


.. function:: trj_xyfitted([s="md.tpr", f="md.xtc"[, ...]])

   Writes a trajectory centered and fitted to the protein in the XY-plane only.

   This is useful for membrane proteins. The system *must* be oriented so that
   the membrane is in the XY plane. The protein backbone is used for the least
   square fit, centering is done for the whole protein., but this can be
   changed with the *input* = ``('backbone', 'protein','system')`` keyword.

   .. Note:: Gromacs 4.x only
    
.. autofunction:: trj_fitandcenter
.. autofunction:: cat
.. autoclass:: Frames
   :members:
.. autoclass:: Transformer
   :members:
.. autofunction:: get_volume

Processing output
-----------------

There are cases when a script has to to do different things depending
on the output from a Gromacs tool. 

For instance, a common case is to check the total charge after
grompping a tpr file. The ``grompp_qtot`` function does just that.

.. autofunction:: grompp_qtot
.. autofunction:: get_volume
.. autofunction:: parse_ndxlist


Working with topologies and mdp files
-------------------------------------

.. autofunction:: create_portable_topology
.. autofunction:: edit_mdp
.. autofunction:: add_mdp_includes
.. autofunction:: grompp_qtot


Working with index files
------------------------

Manipulation of index files (``ndx``) can be cumbersome because the
``make_ndx`` program is not very sophisticated (yet) compared to
full-fledged atom selection expression as available in Charmm_, VMD_, or
MDAnalysis_. Some tools help in building and interpreting index files.

.. SeeAlso:: The :class:`gromacs.formats.NDX` class can solve a number
             of index problems in a cleaner way than the classes and
             functions here.

.. autoclass:: IndexBuilder
   :members: combine, gmx_resid

.. autofunction:: parse_ndxlist
.. autofunction:: get_ndx_groups
.. autofunction:: make_ndx_captured


.. _MDAnalysis: http://mdanalysis.googlecode.com
.. _VMD: http://www.ks.uiuc.edu/Research/vmd/current/ug/node87.html
.. _Charmm: http://www.charmm.org/html/documentation/c35b1/select.html


File editing functions
----------------------

It is often rather useful to be able to change parts of a template
file. For specialized cases the two following functions are useful:

.. autofunction:: edit_mdp
.. autofunction:: edit_txt

Classes [hide private]
  Frames
A iterator that transparently provides frames from a trajectory.
  IndexBuilder
Build an index file with specified groups and the combined group.
  Transformer
Class to handle transformations of trajectories.
Functions [hide private]
 
trj_fitandcenter(xy=False, **kwargs)
Center everything and make a compact representation (pass 1) and fit the system to a reference (pass 2).
source code
 
cat(prefix='md', dirname='.', partsdir='parts', fulldir='full', resolve_multi='pass')
Concatenate all parts of a simulation.
source code
 
glob_parts(prefix, ext)
Find files from a continuation run
source code
 
grompp_qtot(*args, **kwargs)
Run ``gromacs.grompp`` and return the total charge of the system.
source code
 
_mdp_include_string(dirs)
Generate a string that can be added to a mdp 'include = ' line.
source code
 
add_mdp_includes(topology=None, kwargs=None)
Set the mdp *include* key in the *kwargs* dict.
source code
 
create_portable_topology(topol, struct, **kwargs)
Create a processed topology.
source code
 
get_volume(f)
Return the volume in nm^3 of structure file *f*.
source code
 
edit_mdp(mdp, new_mdp=None, extend_parameters=None, **substitutions)
Change values in a Gromacs mdp file.
source code
 
edit_txt(filename, substitutions, newname=None)
Primitive text file stream editor.
source code
 
make_ndx_captured(**kwargs)
make_ndx that captures all output
source code
 
get_ndx_groups(ndx, **kwargs)
Return a list of index groups in the index file *ndx*.
source code
 
parse_ndxlist(output)
Parse output from make_ndx to build list of index groups:
source code
 
parse_groups(output)
Parse make_ndx output and return groups as a list of dicts.
source code
Variables [hide private]
  logger = logging.getLogger('gromacs.cbook')
  trj_compact = tools.Trjconv(ur= 'compact', center= True, boxce...
  rmsd_backbone = tools.G_rms(what= 'rmsd', fit= 'rot+trans', in...
  trj_xyfitted = tools.Trjconv(fit= 'rotxy+transxy', center= Tru...
  NDXLIST = re.compile(r'(?x)>\s+\n\n(?P<LIST>(\s*\d+\s+[^\s]+\s...
compiled regular expression to match a list of index groups in the output of ``make_ndx``s <Enter> (empty) command.
  NDXGROUP = re.compile(r'(?x)\s*(?P<GROUPNUMBER>\d+)\s+(?P<GROU...
compiled regular expression to match a single line of make_ndx output (e.g.
Function Details [hide private]

trj_fitandcenter(xy=False, **kwargs)

source code 
Center everything and make a compact representation (pass 1) and fit the system to a reference (pass 2).

:Keywords:
   *s*
       input structure file (tpr file required to make molecule whole)
   *f*
       input trajectory
   *o*
       output trajectory
   *input*
       A list with three groups. The default is 
           ['backbone', 'protein','system']
       The fit command uses all three (1st for least square fit,
       2nd for centering, 3rd for output), the centered/make-whole stage use
       2nd for centering and 3rd for output.
   *input1*
       If *input1* is supplied then *input* is used exclusively
       for the fitting stage (pass 2) and *input1* for the centering (pass 1).
   *n*
       Index file used for pass 1 and pass 2.
   *n1*
       If *n1* is supplied then index *n1* is only used for pass 1
       (centering) and *n* for pass 2 (fitting).
   *xy* : boolean
       If ``True`` then only do a rot+trans fit in the xy plane
       (good for membrane simulations); default is ``False``.
   *kwargs*
       All other arguments are passed to :class:`~gromacs.tools.Trjconv`.

Note that here we first center the protein and create a compact box, using
``-pbc mol -ur compact -center -boxcenter tric`` and write an intermediate
xtc. Then in a second pass we perform a rotation+translation fit (or
restricted to the xy plane if *xy* = ``True`` is set) on the intermediate
xtc to produce the final trajectory. Doing it in this order has the
disadvantage that the solvent box is rotating around the protein but the
opposite order (with center/compact second) produces strange artifacts
where columns of solvent appear cut out from the box---it probably means
that after rotation the information for the periodic boundaries is not
correct any more.

Most kwargs are passed to both invocations of
:class:`gromacs.tools.Trjconv` so it does not really make sense to use eg
*skip*; in this case do things manually.

By default the *input* to the fit command is ('backbone',
'protein','system'); the compact command always uses the second and third
group for its purposes or if this fails, prompts the user.

Both steps cannot performed in one pass; this is a known limitation of
``trjconv``. An intermediate temporary XTC files is generated which should
be automatically cleaned up unless bad things happened.

The function tries to honour the input/output formats. For instance, if you
want trr output you need to supply a trr file as input and explicitly give
the output file also a trr suffix.

.. Note:: For big trajectories it can **take a very long time**
          and consume a **large amount of temporary diskspace**.

We follow the `g_spatial documentation`_ in preparing the trajectories::

   trjconv -s a.tpr -f a.xtc -o b.xtc -center tric -ur compact -pbc none
   trjconv -s a.tpr -f b.xtc -o c.xtc -fit rot+trans

.. _`g_spatial documentation`: http://oldwiki.gromacs.org/index.php/Manual:g_spatial_4.0.3

cat(prefix='md', dirname='.', partsdir='parts', fulldir='full', resolve_multi='pass')

source code 
Concatenate all parts of a simulation.

The xtc, trr, and edr files in *dirname* such as prefix.xtc,
prefix.part0002.xtc, prefix.part0003.xtc, ... are

   1) moved to the *partsdir* (under *dirname*)
   2) concatenated with the Gromacs tools to yield prefix.xtc, prefix.trr, 
      prefix.edr, prefix.gro (or prefix.md) in *dirname*
   3) Store these trajectories in *fulldir*

.. Note:: Trajectory files are *never* deleted by this function to avoid 
          data loss in case of bugs. You will have to clean up yourself
          by deleting *dirname*/*partsdir*.

          Symlinks for the trajectories are *not* handled well and
          break the function. Use hard links instead.

.. Warning:: If an exception occurs when running this function then make
             doubly and triply sure where your files are before running
             this function again; otherwise you might **overwrite data**.
             Possibly you will need to manually move the files from *partsdir*
             back into the working directory *dirname*; this should onlu overwrite
             generated files so far but *check carefully*!

:Keywords:
    *prefix*
        deffnm of the trajectories [md]
    *resolve_multi"
        how to deal with multiple "final" gro or pdb files: normally there should
        only be one but in case of restarting from the checkpoint of a finished
        simulation one can end up with multiple identical ones.
          - "pass" : do nothing and log a warning
          - "guess" : take prefix.pdb or prefix.gro if it exists, otherwise the one of
                      prefix.partNNNN.gro|pdb with the highes NNNN
    *dirname*
        change to *dirname* and assume all tarjectories are located there [.]
    *partsdir*
         directory where to store the input files (they are moved out of the way);
         *partsdir* must be manually deleted [parts]
    *fulldir*
         directory where to store the final results [full]           

grompp_qtot(*args, **kwargs)

source code 
Run ``gromacs.grompp`` and return the total charge of the  system.

:Arguments:  
   The arguments are the ones one would pass to :func:`gromacs.grompp`.
:Returns:
   The total charge as reported

Some things to keep in mind:

* The stdout output of grompp is not shown. This can make debugging
  pretty hard.  Try running the normal :func:`gromacs.grompp` command and
  analyze the output if the debugging messages are not sufficient.
* Check that ``qtot`` is correct; because the function is based on pattern 
  matching of the output it can break when the output format changes.

add_mdp_includes(topology=None, kwargs=None)

source code 
Set the mdp *include* key in the *kwargs* dict.

1. Add the directory containing *topology*.
2. Add all directories appearing under the key *includes*
3. Generate a string of the form "-Idir1 -Idir2 ..." that
   is stored under the key *include* (the corresponding
   mdp parameter)

By default, the directories ``.`` and ``..`` are also added to the
*include* string for the mdp; when fed into
:func:`gromacs.cbook.edit_mdp` it will result in a line such as ::

  include = -I. -I.. -I../topology_dir ....

Note that the user can always override the behaviour by setting
the *include* keyword herself; in this case this function does
nothing.

If no *kwargs* were supplied then a dict is generated with the
single *include* entry.

:Arguments:
   *topology* : top filename
      Topology file; the name of the enclosing directory is added
      to the include path (if supplied) [``None``]
   *kwargs* : dict
      Optional dictionary of mdp keywords; will be modified in place.
      If it contains the *includes* keyword with either a single string
      or a list of strings then these paths will be added to the
      include statement.
:Returns: *kwargs* with the *include* keyword added if it did not
          exist previously; if the keyword already existed, nothing
          happens.

.. Note:: The *kwargs* dict is **modified in place**. This
          function is a bit of a hack. It might be removed once
          all setup functions become methods in a nice class.

create_portable_topology(topol, struct, **kwargs)

source code 

Create a processed topology.

The processed (or portable) topology file does not contain any #include statements and hence can be easily copied around. It also makes it possible to re-grompp without having any special itp files available.

Returns:
full path to the processed trajectory

Arguments:

topol
topology file
struct
coordinat (structure) file

Keywords:

processed
name of the new topology file; if not set then it is named like topol but with pp_ prepended
includes
path or list of paths of directories in which itp files are searched for

get_volume(f)

source code 
Return the volume in nm^3 of structure file *f*.

(Uses :func:`gromacs.editconf`; error handling is not good)

edit_mdp(mdp, new_mdp=None, extend_parameters=None, **substitutions)

source code 
Change values in a Gromacs mdp file.

Parameters and values are supplied as substitutions, eg ``nsteps=1000``.

By default the template mdp file is **overwritten in place**.

If a parameter does not exist in the template then it cannot be substituted
and the parameter/value pair is returned. The user has to check the
returned list in order to make sure that everything worked as expected. At
the moment it is not possible to automatically append the new values to the
mdp file because of ambiguities when having to replace dashes in parameter
names with underscores (see the notes below on dashes/underscores).

If a parameter is set to the value ``None`` then it will be ignored.

:Arguments:
    *mdp* : filename
        filename of input (and output filename of ``new_mdp=None``)
    *new_mdp* : filename
        filename of alternative output mdp file [None]
    *extend_parameters* : string or list of strings
        single parameter or list of parameters for which the new values 
        should be appended to the existing value in the mdp file. This 
        makes mostly sense for a single parameter, namely 'include', which
        is set as the default. Set to ``[]`` to disable. ['include']
    *substitutions*
        parameter=value pairs, where parameter is defined by the Gromacs mdp file; 
        dashes in parameter names have to be replaced by underscores.

:Returns:    
    Dict of parameters that have *not* been substituted.

**Example** ::

   edit_mdp('md.mdp', new_mdp='long_md.mdp', nsteps=100000, nstxtcout=1000, lincs_iter=2)

.. Note::

   * Dashes in Gromacs mdp parameters have to be replaced by an underscore
     when supplied as python keyword arguments (a limitation of python). For example
     the MDP syntax is  ``lincs-iter = 4`` but the corresponding  keyword would be 
     ``lincs_iter = 4``.
   * If the keyword is set as a dict key, eg ``mdp_params['lincs-iter']=4`` then one
     does not have to substitute.
   * Parameters *aa_bb* and *aa-bb* are considered the same (although this should
     not be a problem in practice because there are no mdp parameters that only 
     differ by a underscore). 
   * This code is more compact in ``Perl`` as one can use ``s///`` operators:
     ``s/^(\s*${key}\s*=\s*).*/$1${val}/``

.. SeeAlso:: One can also load the mdp file with
            :class:`gromacs.formats.MDP`, edit the object (a dict), and save it again.

edit_txt(filename, substitutions, newname=None)

source code 
Primitive text file stream editor.

This function can be used to edit free-form text files such as the
topology file. By default it does an **in-place edit** of
*filename*. If *newname* is supplied then the edited
file is written to *newname*.

:Arguments:
   *filename*
       input text file
   *substitutions*
       substitution commands (see below for format)
   *newname*
       output filename; if ``None`` then *filename* is changed in
       place [``None``]       

*substitutions* is a list of triplets; the first two elements are regular
expression strings, the last is the substitution value. It mimics
``sed`` search and replace. The rules for *substitutions*:
    
.. productionlist::
   substitutions: "[" search_replace_tuple, ... "]"
   search_replace_tuple: "(" line_match_RE "," search_RE "," replacement ")"
   line_match_RE: regular expression that selects the line (uses match)
   search_RE: regular expression that is searched in the line
   replacement: replacement string for search_RE

Running :func:`edit_txt` does pretty much what a simple ::

     sed /line_match_RE/s/search_RE/replacement/

with repeated substitution commands does.

Special replacement values:
- ``None``: the rule is ignored
- ``False``: the line is deleted (even if other rules match)

.. note::

   * No sanity checks are performed and the substitutions must be supplied
     exactly as shown.
   * All substitutions are applied to a line; thus the order of the substitution 
     commands may matter when one substitution generates a match for a subsequent rule.
   * If replacement is set to ``None`` then the whole expression is ignored and
     whatever is in the template is used. To unset values you must provided an
     empty string or similar.
   * Delete a matching line if replacement=``False``.

make_ndx_captured(**kwargs)

source code 
make_ndx that captures all output

Standard :func:`~gromacs.make_ndx` command with the input and
output pre-set in such a way that it can be conveniently used for
:func:`parse_ndxlist`.     

Example::
  ndx_groups = parse_ndxlist(make_ndx_captured(n=ndx)[0])

Note that the convenient :func:`get_ndx_groups` function does exactly
that and can probably used in most cases.

:Arguments:
    keywords are passed on to :func:`~gromacs.make_ndx`
:Returns:
    (*returncode*, *output*, ``None``)

get_ndx_groups(ndx, **kwargs)

source code 
Return a list of index groups in the index file *ndx*.

:Arguments:  
    - *ndx*  is a Gromacs index file.
    - kwargs are passed to :func:`make_ndx_captured`.
    
:Returns:
    list of groups as supplied by :func:`parse_ndxlist`

Alternatively, load the index file with
:class:`gromacs.formats.NDX` for full control.

parse_ndxlist(output)

source code 

Parse output from make_ndx to build list of index groups:

groups = parse_ndxlist(output)

output should be the standard output from make_ndx, e.g.:

rc,output,junk = gromacs.make_ndx(..., input=('', 'q'), stdout=False, stderr=True)

(or simply use

rc,output,junk = cbook.make_ndx_captured(...)

which presets input, stdout and stderr; of course input can be overriden.)

Returns:

The function returns a list of dicts (groups) with fields

name
name of the groups
nr
number of the group (starts at 0)
natoms
number of atoms in the group

Variables Details [hide private]

trj_compact

Value:
tools.Trjconv(ur= 'compact', center= True, boxcenter= 'tric', pbc= 'mo\
l', input= ('protein', 'system'), doc= """
Writes a compact representation of the system centered on the protein"\
"")

rmsd_backbone

Value:
tools.G_rms(what= 'rmsd', fit= 'rot+trans', input= ('Backbone', 'Backb\
one'), doc= """
Computes RMSD of backbone after fitting to the backbone.""")

trj_xyfitted

Value:
tools.Trjconv(fit= 'rotxy+transxy', center= True, boxcenter= 'rect', p\
bc= 'whole', input= ('backbone', 'protein', 'system'), doc= """
Writes a trajectory centered and fitted to the protein in the XY-plane\
 only.

This is useful for membrane proteins. The system *must* be oriented so
that the membrane is in the XY plane. The protein backbone is used
for the least square fit, centering is done for the whole protein.
...

NDXLIST

compiled regular expression to match a list of index groups in the output of ``make_ndx``s <Enter> (empty) command.
Value:
re.compile(r'(?x)>\s+\n\n(?P<LIST>(\s*\d+\s+[^\s]+\s*:\s*\d+\satoms\n)\
+)')

NDXGROUP

compiled regular expression to match a single line of make_ndx output (e.g. after a successful group creation)
Value:
re.compile(r'(?x)\s*(?P<GROUPNUMBER>\d+)\s+(?P<GROUPNAME>[^\s]+)\s*:\s\
*(?P<NATOMS>\d+)\satoms')