1
2
3
4
5 """
6 :mod:`gromacs.cbook` -- Gromacs Cook Book
7 =========================================
8
9 The :mod:`~gromacs.cbook` (cook book) module contains short recipes for tasks
10 that are often repeated. In the simplest case this is just one of the
11 gromacs tools with a certain set of default command line options.
12
13 By abstracting and collecting these invocations here, errors can be
14 reduced and the code snippets can also serve as canonical examples for
15 how to do simple things.
16
17 Miscellaneous canned Gromacs commands
18 -------------------------------------
19
20 Simple commands with new default options so that they solve a specific
21 problem (see also `Manipulating trajectories and structures`_):
22
23 .. function:: rmsd_backbone([s="md.tpr", f="md.xtc"[, ...]])
24
25 Computes the RMSD of the "Backbone" atoms after fitting to the
26 "Backbone" (including both translation and rotation).
27
28
29 Manipulating trajectories and structures
30 ----------------------------------------
31
32 Standard invocations for manipulating trajectories.
33
34 .. function:: trj_compact([s="md.tpr", f="md.xtc", o="compact.xtc"[, ...]])
35
36 Writes an output trajectory or frame with a compact representation
37 of the system centered on the protein. It centers on the group
38 "Protein" and outputs the whole "System" group.
39
40
41 .. function:: trj_xyfitted([s="md.tpr", f="md.xtc"[, ...]])
42
43 Writes a trajectory centered and fitted to the protein in the XY-plane only.
44
45 This is useful for membrane proteins. The system *must* be oriented so that
46 the membrane is in the XY plane. The protein backbone is used for the least
47 square fit, centering is done for the whole protein., but this can be
48 changed with the *input* = ``('backbone', 'protein','system')`` keyword.
49
50 .. Note:: Gromacs 4.x only
51
52 .. autofunction:: trj_fitandcenter
53 .. autofunction:: cat
54 .. autoclass:: Frames
55 :members:
56 .. autoclass:: Transformer
57 :members:
58 .. autofunction:: get_volume
59
60 Processing output
61 -----------------
62
63 There are cases when a script has to to do different things depending
64 on the output from a Gromacs tool.
65
66 For instance, a common case is to check the total charge after
67 grompping a tpr file. The ``grompp_qtot`` function does just that.
68
69 .. autofunction:: grompp_qtot
70 .. autofunction:: get_volume
71 .. autofunction:: parse_ndxlist
72
73
74 Working with topologies and mdp files
75 -------------------------------------
76
77 .. autofunction:: create_portable_topology
78 .. autofunction:: edit_mdp
79 .. autofunction:: add_mdp_includes
80 .. autofunction:: grompp_qtot
81
82
83 Working with index files
84 ------------------------
85
86 Manipulation of index files (``ndx``) can be cumbersome because the
87 ``make_ndx`` program is not very sophisticated (yet) compared to
88 full-fledged atom selection expression as available in Charmm_, VMD_, or
89 MDAnalysis_. Some tools help in building and interpreting index files.
90
91 .. SeeAlso:: The :class:`gromacs.formats.NDX` class can solve a number
92 of index problems in a cleaner way than the classes and
93 functions here.
94
95 .. autoclass:: IndexBuilder
96 :members: combine, gmx_resid
97
98 .. autofunction:: parse_ndxlist
99 .. autofunction:: get_ndx_groups
100 .. autofunction:: make_ndx_captured
101
102
103 .. _MDAnalysis: http://mdanalysis.googlecode.com
104 .. _VMD: http://www.ks.uiuc.edu/Research/vmd/current/ug/node87.html
105 .. _Charmm: http://www.charmm.org/html/documentation/c35b1/select.html
106
107
108 File editing functions
109 ----------------------
110
111 It is often rather useful to be able to change parts of a template
112 file. For specialized cases the two following functions are useful:
113
114 .. autofunction:: edit_mdp
115 .. autofunction:: edit_txt
116 """
117
118
119
120
121
122
123
124
125
126
127
128 from __future__ import with_statement
129
130 __docformat__ = "restructuredtext en"
131
132 import sys
133 import os
134 import re
135 import warnings
136 import tempfile
137 import shutil
138 import glob
139
140 import logging
141 logger = logging.getLogger('gromacs.cbook')
142
143 import gromacs
144 from gromacs import GromacsError, BadParameterWarning, MissingDataWarning, GromacsValueWarning
145 import tools
146 import utilities
147 from utilities import asiterable
148
149 trj_compact = tools.Trjconv(ur='compact', center=True, boxcenter='tric', pbc='mol',
150 input=('protein','system'),
151 doc="""
152 Writes a compact representation of the system centered on the protein""")
153
154 rmsd_backbone = tools.G_rms(what='rmsd', fit='rot+trans',
155 input=('Backbone','Backbone'),
156 doc="""
157 Computes RMSD of backbone after fitting to the backbone.""")
158
159
160 trj_xyfitted = tools.Trjconv(fit='rotxy+transxy',
161 center=True, boxcenter='rect', pbc='whole',
162 input=('backbone', 'protein','system'),
163 doc="""
164 Writes a trajectory centered and fitted to the protein in the XY-plane only.
165
166 This is useful for membrane proteins. The system *must* be oriented so
167 that the membrane is in the XY plane. The protein backbone is used
168 for the least square fit, centering is done for the whole protein.
169
170 .. Note:: Gromacs 4.x only""")
173 """Center everything and make a compact representation (pass 1) and fit the system to a reference (pass 2).
174
175 :Keywords:
176 *s*
177 input structure file (tpr file required to make molecule whole)
178 *f*
179 input trajectory
180 *o*
181 output trajectory
182 *input*
183 A list with three groups. The default is
184 ['backbone', 'protein','system']
185 The fit command uses all three (1st for least square fit,
186 2nd for centering, 3rd for output), the centered/make-whole stage use
187 2nd for centering and 3rd for output.
188 *input1*
189 If *input1* is supplied then *input* is used exclusively
190 for the fitting stage (pass 2) and *input1* for the centering (pass 1).
191 *n*
192 Index file used for pass 1 and pass 2.
193 *n1*
194 If *n1* is supplied then index *n1* is only used for pass 1
195 (centering) and *n* for pass 2 (fitting).
196 *xy* : boolean
197 If ``True`` then only do a rot+trans fit in the xy plane
198 (good for membrane simulations); default is ``False``.
199 *kwargs*
200 All other arguments are passed to :class:`~gromacs.tools.Trjconv`.
201
202 Note that here we first center the protein and create a compact box, using
203 ``-pbc mol -ur compact -center -boxcenter tric`` and write an intermediate
204 xtc. Then in a second pass we perform a rotation+translation fit (or
205 restricted to the xy plane if *xy* = ``True`` is set) on the intermediate
206 xtc to produce the final trajectory. Doing it in this order has the
207 disadvantage that the solvent box is rotating around the protein but the
208 opposite order (with center/compact second) produces strange artifacts
209 where columns of solvent appear cut out from the box---it probably means
210 that after rotation the information for the periodic boundaries is not
211 correct any more.
212
213 Most kwargs are passed to both invocations of
214 :class:`gromacs.tools.Trjconv` so it does not really make sense to use eg
215 *skip*; in this case do things manually.
216
217 By default the *input* to the fit command is ('backbone',
218 'protein','system'); the compact command always uses the second and third
219 group for its purposes or if this fails, prompts the user.
220
221 Both steps cannot performed in one pass; this is a known limitation of
222 ``trjconv``. An intermediate temporary XTC files is generated which should
223 be automatically cleaned up unless bad things happened.
224
225 The function tries to honour the input/output formats. For instance, if you
226 want trr output you need to supply a trr file as input and explicitly give
227 the output file also a trr suffix.
228
229 .. Note:: For big trajectories it can **take a very long time**
230 and consume a **large amount of temporary diskspace**.
231
232 We follow the `g_spatial documentation`_ in preparing the trajectories::
233
234 trjconv -s a.tpr -f a.xtc -o b.xtc -center tric -ur compact -pbc none
235 trjconv -s a.tpr -f b.xtc -o c.xtc -fit rot+trans
236
237 .. _`g_spatial documentation`: http://oldwiki.gromacs.org/index.php/Manual:g_spatial_4.0.3
238 """
239 if xy:
240 fitmode = 'rotxy+transxy'
241 kwargs.pop('fit', None)
242 else:
243 fitmode = kwargs.pop('fit', 'rot+trans')
244
245 intrj = kwargs.pop('f', None)
246
247
248 suffix = os.path.splitext(intrj)[1]
249 if not suffix in ('xtc', 'trr'):
250 suffix = '.xtc'
251 outtrj = kwargs.pop('o', None)
252
253 ndx = kwargs.pop('n', None)
254 ndxcompact = kwargs.pop('n1', ndx)
255
256
257 inpfit = kwargs.pop('input', ('backbone', 'protein','system'))
258 try:
259 _inpcompact = inpfit[1:]
260 except TypeError:
261 _inpcompact = None
262 inpcompact = kwargs.pop('input1', _inpcompact)
263
264 fd, tmptrj = tempfile.mkstemp(suffix=suffix, prefix='fitted_')
265
266 logger.info("Input trajectory: %(intrj)r\nOutput trajectory: %(outtrj)r"% vars())
267 logger.info("... writing temporary trajectory %(tmptrj)r (will be auto-cleaned)." % vars())
268 sys.stdout.flush()
269 try:
270 trj_compact(f=intrj, o=tmptrj, n=ndxcompact, input=inpcompact, **kwargs)
271 trj_xyfitted(f=tmptrj, o=outtrj, n=ndx, fit=fitmode, input=inpfit, **kwargs)
272 finally:
273 utilities.unlink_gmx(tmptrj)
274
275 -def cat(prefix="md", dirname=os.path.curdir, partsdir="parts", fulldir="full",
276 resolve_multi="pass"):
277 """Concatenate all parts of a simulation.
278
279 The xtc, trr, and edr files in *dirname* such as prefix.xtc,
280 prefix.part0002.xtc, prefix.part0003.xtc, ... are
281
282 1) moved to the *partsdir* (under *dirname*)
283 2) concatenated with the Gromacs tools to yield prefix.xtc, prefix.trr,
284 prefix.edr, prefix.gro (or prefix.md) in *dirname*
285 3) Store these trajectories in *fulldir*
286
287 .. Note:: Trajectory files are *never* deleted by this function to avoid
288 data loss in case of bugs. You will have to clean up yourself
289 by deleting *dirname*/*partsdir*.
290
291 Symlinks for the trajectories are *not* handled well and
292 break the function. Use hard links instead.
293
294 .. Warning:: If an exception occurs when running this function then make
295 doubly and triply sure where your files are before running
296 this function again; otherwise you might **overwrite data**.
297 Possibly you will need to manually move the files from *partsdir*
298 back into the working directory *dirname*; this should onlu overwrite
299 generated files so far but *check carefully*!
300
301 :Keywords:
302 *prefix*
303 deffnm of the trajectories [md]
304 *resolve_multi"
305 how to deal with multiple "final" gro or pdb files: normally there should
306 only be one but in case of restarting from the checkpoint of a finished
307 simulation one can end up with multiple identical ones.
308 - "pass" : do nothing and log a warning
309 - "guess" : take prefix.pdb or prefix.gro if it exists, otherwise the one of
310 prefix.partNNNN.gro|pdb with the highes NNNN
311 *dirname*
312 change to *dirname* and assume all tarjectories are located there [.]
313 *partsdir*
314 directory where to store the input files (they are moved out of the way);
315 *partsdir* must be manually deleted [parts]
316 *fulldir*
317 directory where to store the final results [full]
318 """
319
320 gmxcat = {'xtc': gromacs.trjcat,
321 'trr': gromacs.trjcat,
322 'edr': gromacs.eneconv,
323 'log': utilities.cat,
324 }
325
326 def _cat(prefix, ext, partsdir=partsdir, fulldir=fulldir):
327 filenames = glob_parts(prefix, ext)
328 if ext.startswith('.'):
329 ext = ext[1:]
330 outfile = os.path.join(fulldir, prefix + '.' + ext)
331 if not filenames:
332 return None
333 nonempty_files = []
334 for f in filenames:
335 if os.stat(f).st_size == 0:
336 logger.warn("File %(f)r is empty, skipping" % vars())
337 continue
338 if os.path.islink(f):
339
340 errmsg = "Symbolic links do not work (file %(f)r), sorry. " \
341 "CHECK LOCATION OF FILES MANUALLY BEFORE RUNNING gromacs.cbook.cat() AGAIN!" % vars()
342 logger.exception(errmsg)
343 raise NotImplementedError(errmsg)
344 shutil.move(f, partsdir)
345 nonempty_files.append(f)
346 filepaths = [os.path.join(partsdir, f) for f in nonempty_files]
347 gmxcat[ext](f=filepaths, o=outfile)
348 return outfile
349
350 _resolve_options = ("pass", "guess")
351 if not resolve_multi in _resolve_options:
352 raise ValueError("resolve_multi must be one of %(_resolve_options)r, "
353 "not %(resolve_multi)r" % vars())
354
355 if fulldir == os.path.curdir:
356 wmsg = "Using the current directory as fulldir can potentially lead to data loss if you run this function multiple times."
357 logger.warning(wmsg)
358 warnings.warn(wmsg, category=BadParameterWarning)
359
360 with utilities.in_dir(dirname, create=False):
361 utilities.mkdir_p(partsdir)
362 utilities.mkdir_p(fulldir)
363 for ext in ('log', 'edr', 'trr', 'xtc'):
364 logger.info("[%(dirname)s] concatenating %(ext)s files...", vars())
365 outfile = _cat(prefix, ext, partsdir)
366 logger.info("[%(dirname)s] created %(outfile)r", vars())
367 for ext in ('gro', 'pdb'):
368 filenames = glob_parts(prefix, ext)
369 if len(filenames) == 0:
370 continue
371 elif len(filenames) == 1:
372 pick = filenames[0]
373 else:
374 if resolve_multi == "pass":
375 logger.warning("[%(dirname)s] too many output structures %(filenames)r, "
376 "cannot decide which one --- resolve manually!", vars())
377 for f in filenames:
378 shutil.move(f, partsdir)
379 continue
380 elif resolve_multi == "guess":
381 pick = prefix + '.' + ext
382 if not pick in filenames:
383 pick = filenames[-1]
384 final = os.path.join(fulldir, prefix + '.' + ext)
385 shutil.copy(pick, final)
386 for f in filenames:
387 shutil.move(f, partsdir)
388 logger.info("[%(dirname)s] collected final structure %(final)r "
389 "(from %(pick)r)", vars())
390
391
392 partsdirpath = utilities.realpath(dirname, partsdir)
393 logger.warn("[%(dirname)s] cat() complete in %(fulldir)r but original files "
394 "in %(partsdirpath)r must be manually removed", vars())
395
397 """Find files from a continuation run"""
398 if ext.startswith('.'):
399 ext = ext[1:]
400 files = glob.glob(prefix+'.'+ext) + glob.glob(prefix+'.part[0-9][0-9][0-9][0-9].'+ext)
401 files.sort()
402 return files
403
406 """A iterator that transparently provides frames from a trajectory.
407
408 The iterator chops a trajectory into individual frames for
409 analysis tools that only work on separate structures such as
410 ``gro`` or ``pdb`` files. Instead of turning the whole trajectory
411 immediately into pdb files (and potentially filling the disk), the
412 iterator can be instructed to only provide a fixed number of
413 frames and compute more frames when needed.
414
415 .. Note:: Setting a limit on the number of frames on disk can lead
416 to longish waiting times because ``trjconv`` must
417 re-seek to the middle of the trajectory and the only way
418 it can do this at the moment is by reading frames
419 sequentially. This might still be preferrable to filling
420 up a disk, though.
421
422 .. Warning:: The *maxframes* option is not implemented yet; use
423 the *dt* option or similar to keep the number of frames
424 manageable.
425 """
426
427 - def __init__(self, structure, trj, maxframes=None, format='pdb', **kwargs):
428 """Set up the Frames iterator.
429
430 :Arguments:
431 structure
432 name of a structure file (tpr, pdb, ...)
433 trj
434 name of the trajectory (xtc, trr, ...)
435 format
436 output format for the frames, eg "pdb" or "gro" [pdb]
437 maxframes : int
438 maximum number of frames that are extracted to disk at
439 one time; set to ``None`` to extract the whole trajectory
440 at once. [``None``]
441 kwargs
442 All other arguments are passed to
443 `class:~gromacs.tools.Trjconv`; the only options that
444 cannot be changed are *sep* and the output file name *o*.
445
446 """
447 self.structure = structure
448 self.trj = trj
449 self.maxframes = maxframes
450 if not self.maxframes is None:
451 raise NotImplementedError('sorry, maxframes feature not implemented yet')
452
453 self.framedir = tempfile.mkdtemp(prefix="Frames_", suffix='_'+format)
454 self.frameprefix = os.path.join(self.framedir, 'frame')
455 self.frametemplate = self.frameprefix + '%d' + '.' + format
456 self.frameglob = self.frameprefix + '*' + '.' + format
457 kwargs['sep'] = True
458 kwargs['o'] = self.frameprefix + '.' + format
459 kwargs.setdefault('input', ('System',))
460 self.extractor = tools.Trjconv(s=self.structure, f=self.trj, **kwargs)
461
462
463
464 self.framenumber = 0
465
466 self.totalframes = 0
467
469 """Extract frames from the trajectory to the temporary directory."""
470
471 self.extractor.run()
472
473 @property
475 """Unordered list of all frames currently held on disk."""
476 return glob.glob(self.frameglob)
477
478 @property
480 return self.frametemplate % self.framenumber
481
483 """Primitive iterator."""
484 frames = self.all_frames
485 if len(frames) == 0:
486 self.extract()
487 frames = self.all_frames
488
489
490
491 for i in xrange(len(frames)):
492 self.framenumber = i
493 yield self.current_framename
494 self.totalframes += len(frames)
495
497 """Delete all frames."""
498 for frame in glob.glob(self.frameglob):
499 os.unlink(frame)
500
502 """Clean up all temporary frames (which can be HUGE)."""
503 shutil.rmtree(self.framedir)
504 self.framedir = None
505
507 if not self.framedir is None:
508 self.cleanup()
509
514 """Run ``gromacs.grompp`` and return the total charge of the system.
515
516 :Arguments:
517 The arguments are the ones one would pass to :func:`gromacs.grompp`.
518 :Returns:
519 The total charge as reported
520
521 Some things to keep in mind:
522
523 * The stdout output of grompp is not shown. This can make debugging
524 pretty hard. Try running the normal :func:`gromacs.grompp` command and
525 analyze the output if the debugging messages are not sufficient.
526 * Check that ``qtot`` is correct; because the function is based on pattern
527 matching of the output it can break when the output format changes.
528 """
529
530
531 qtot_pattern = re.compile('System has non-zero total charge: *(?P<qtot>[-+]?\d*\.\d+[eE][-+]\d+)')
532 kwargs['stdout'] = False
533 rc, output, junk = gromacs.grompp(*args, **kwargs)
534 qtot = 0
535 for line in output.split('\n'):
536 m = qtot_pattern.search(line)
537 if m:
538 qtot = float(m.group('qtot'))
539 break
540 logger.info("system total charge qtot = %(qtot)r" % vars())
541 return qtot
542
544 """Generate a string that can be added to a mdp 'include = ' line."""
545 include_paths = [os.path.expanduser(p) for p in dirs]
546 return ' -I'.join([''] + include_paths)
547
549 """Set the mdp *include* key in the *kwargs* dict.
550
551 1. Add the directory containing *topology*.
552 2. Add all directories appearing under the key *includes*
553 3. Generate a string of the form "-Idir1 -Idir2 ..." that
554 is stored under the key *include* (the corresponding
555 mdp parameter)
556
557 By default, the directories ``.`` and ``..`` are also added to the
558 *include* string for the mdp; when fed into
559 :func:`gromacs.cbook.edit_mdp` it will result in a line such as ::
560
561 include = -I. -I.. -I../topology_dir ....
562
563 Note that the user can always override the behaviour by setting
564 the *include* keyword herself; in this case this function does
565 nothing.
566
567 If no *kwargs* were supplied then a dict is generated with the
568 single *include* entry.
569
570 :Arguments:
571 *topology* : top filename
572 Topology file; the name of the enclosing directory is added
573 to the include path (if supplied) [``None``]
574 *kwargs* : dict
575 Optional dictionary of mdp keywords; will be modified in place.
576 If it contains the *includes* keyword with either a single string
577 or a list of strings then these paths will be added to the
578 include statement.
579 :Returns: *kwargs* with the *include* keyword added if it did not
580 exist previously; if the keyword already existed, nothing
581 happens.
582
583 .. Note:: The *kwargs* dict is **modified in place**. This
584 function is a bit of a hack. It might be removed once
585 all setup functions become methods in a nice class.
586 """
587 if kwargs is None:
588 kwargs = {}
589
590 if not topology is None:
591
592
593 topology_dir = os.path.dirname(topology)
594 include_dirs = ['.', '..', topology_dir]
595
596 include_dirs.extend(asiterable(kwargs.pop('includes', [])))
597
598
599
600 kwargs.setdefault('include', _mdp_include_string(include_dirs))
601 return kwargs
602
605 """Create a processed topology.
606
607 The processed (or portable) topology file does not contain any
608 ``#include`` statements and hence can be easily copied around. It
609 also makes it possible to re-grompp without having any special itp
610 files available.
611
612 :Arguments:
613 *topol*
614 topology file
615 *struct*
616 coordinat (structure) file
617
618 :Keywords:
619 *processed*
620 name of the new topology file; if not set then it is named like
621 *topol* but with ``pp_`` prepended
622 *includes*
623 path or list of paths of directories in which itp files are
624 searched for
625
626 :Returns: full path to the processed trajectory
627 """
628 _topoldir, _topol = os.path.split(topol)
629 processed = kwargs.pop('processed', os.path.join(_topoldir, 'pp_'+_topol))
630 mdp_kwargs = add_mdp_includes(topol, kwargs)
631 with tempfile.NamedTemporaryFile(suffix='.mdp') as mdp:
632 mdp.write('; empty mdp file\ninclude = %(include)s\n' % mdp_kwargs)
633 mdp.flush()
634 try:
635 gromacs.grompp(v=False, f=mdp.name, p=topol, c=struct, pp=processed)
636 finally:
637 utilities.unlink_gmx('topol.tpr', 'mdout.mdp')
638 return utilities.realpath(processed)
639
641 """Return the volume in nm^3 of structure file *f*.
642
643 (Uses :func:`gromacs.editconf`; error handling is not good)
644 """
645 fd, temp = tempfile.mkstemp('.gro')
646 try:
647 rc,out,err = gromacs.editconf(f=f, o=temp, stdout=False)
648 finally:
649 os.unlink(temp)
650 return [float(x.split()[1]) for x in out.splitlines()
651 if x.startswith('Volume:')][0]
652
653
654
655
656
657 -def edit_mdp(mdp, new_mdp=None, extend_parameters=None, **substitutions):
658 """Change values in a Gromacs mdp file.
659
660 Parameters and values are supplied as substitutions, eg ``nsteps=1000``.
661
662 By default the template mdp file is **overwritten in place**.
663
664 If a parameter does not exist in the template then it cannot be substituted
665 and the parameter/value pair is returned. The user has to check the
666 returned list in order to make sure that everything worked as expected. At
667 the moment it is not possible to automatically append the new values to the
668 mdp file because of ambiguities when having to replace dashes in parameter
669 names with underscores (see the notes below on dashes/underscores).
670
671 If a parameter is set to the value ``None`` then it will be ignored.
672
673 :Arguments:
674 *mdp* : filename
675 filename of input (and output filename of ``new_mdp=None``)
676 *new_mdp* : filename
677 filename of alternative output mdp file [None]
678 *extend_parameters* : string or list of strings
679 single parameter or list of parameters for which the new values
680 should be appended to the existing value in the mdp file. This
681 makes mostly sense for a single parameter, namely 'include', which
682 is set as the default. Set to ``[]`` to disable. ['include']
683 *substitutions*
684 parameter=value pairs, where parameter is defined by the Gromacs mdp file;
685 dashes in parameter names have to be replaced by underscores.
686
687 :Returns:
688 Dict of parameters that have *not* been substituted.
689
690 **Example** ::
691
692 edit_mdp('md.mdp', new_mdp='long_md.mdp', nsteps=100000, nstxtcout=1000, lincs_iter=2)
693
694 .. Note::
695
696 * Dashes in Gromacs mdp parameters have to be replaced by an underscore
697 when supplied as python keyword arguments (a limitation of python). For example
698 the MDP syntax is ``lincs-iter = 4`` but the corresponding keyword would be
699 ``lincs_iter = 4``.
700 * If the keyword is set as a dict key, eg ``mdp_params['lincs-iter']=4`` then one
701 does not have to substitute.
702 * Parameters *aa_bb* and *aa-bb* are considered the same (although this should
703 not be a problem in practice because there are no mdp parameters that only
704 differ by a underscore).
705 * This code is more compact in ``Perl`` as one can use ``s///`` operators:
706 ``s/^(\s*${key}\s*=\s*).*/$1${val}/``
707
708 .. SeeAlso:: One can also load the mdp file with
709 :class:`gromacs.formats.MDP`, edit the object (a dict), and save it again.
710 """
711 if new_mdp is None:
712 new_mdp = mdp
713 if extend_parameters is None:
714 extend_parameters = ['include']
715 else:
716 extend_parameters = list(asiterable(extend_parameters))
717
718
719 substitutions = dict([(k,v) for k,v in substitutions.items() if not v is None])
720
721 params = substitutions.keys()[:]
722
723 def demangled(p):
724 """Return a RE string that matches the parameter."""
725 return p.replace('_', '[-_]')
726
727 patterns = dict([(parameter,
728 re.compile("""\
729 (?P<assignment>\s*%s\s*=\s*) # parameter == everything before the value
730 (?P<value>[^;]*) # value (stop before comment=;)
731 (?P<comment>\s*;.*)? # optional comment
732 """ % demangled(parameter), re.VERBOSE))
733 for parameter in substitutions])
734
735 target = tempfile.TemporaryFile()
736 with open(mdp) as src:
737 logger.info("editing mdp = %r: %r" % (mdp, substitutions.keys()))
738 for line in src:
739 new_line = line.strip()
740 for p in params[:]:
741 m = patterns[p].match(new_line)
742 if m:
743
744
745
746
747 if m.group('comment') is None:
748 comment = ''
749 else:
750 comment = " "+m.group('comment')
751 assignment = m.group('assignment')
752 if not assignment.endswith(' '):
753 assignment += ' '
754
755 new_line = assignment
756 if p in extend_parameters:
757
758 new_line += str(m.group('value')) + ' '
759 new_line += str(substitutions[p]) + comment
760 params.remove(p)
761 break
762 target.write(new_line+'\n')
763 target.seek(0)
764
765 with open(new_mdp, 'w') as final:
766 shutil.copyfileobj(target, final)
767 target.close()
768
769 if len(params) > 0:
770 logger.warn("Not substituted in %(new_mdp)r: %(params)r" % vars())
771 return dict([(p, substitutions[p]) for p in params])
772
773 -def edit_txt(filename, substitutions, newname=None):
774 """Primitive text file stream editor.
775
776 This function can be used to edit free-form text files such as the
777 topology file. By default it does an **in-place edit** of
778 *filename*. If *newname* is supplied then the edited
779 file is written to *newname*.
780
781 :Arguments:
782 *filename*
783 input text file
784 *substitutions*
785 substitution commands (see below for format)
786 *newname*
787 output filename; if ``None`` then *filename* is changed in
788 place [``None``]
789
790 *substitutions* is a list of triplets; the first two elements are regular
791 expression strings, the last is the substitution value. It mimics
792 ``sed`` search and replace. The rules for *substitutions*:
793
794 .. productionlist::
795 substitutions: "[" search_replace_tuple, ... "]"
796 search_replace_tuple: "(" line_match_RE "," search_RE "," replacement ")"
797 line_match_RE: regular expression that selects the line (uses match)
798 search_RE: regular expression that is searched in the line
799 replacement: replacement string for search_RE
800
801 Running :func:`edit_txt` does pretty much what a simple ::
802
803 sed /line_match_RE/s/search_RE/replacement/
804
805 with repeated substitution commands does.
806
807 Special replacement values:
808 - ``None``: the rule is ignored
809 - ``False``: the line is deleted (even if other rules match)
810
811 .. note::
812
813 * No sanity checks are performed and the substitutions must be supplied
814 exactly as shown.
815 * All substitutions are applied to a line; thus the order of the substitution
816 commands may matter when one substitution generates a match for a subsequent rule.
817 * If replacement is set to ``None`` then the whole expression is ignored and
818 whatever is in the template is used. To unset values you must provided an
819 empty string or similar.
820 * Delete a matching line if replacement=``False``.
821 """
822 if newname is None:
823 newname = filename
824
825
826
827 _substitutions = [{'lRE': re.compile(str(lRE)),
828 'sRE': re.compile(str(sRE)),
829 'repl': repl}
830 for lRE,sRE,repl in substitutions if not repl is None]
831
832 target = tempfile.TemporaryFile()
833 with open(filename) as src:
834 logger.info("editing txt = %r (%d substitutions)" % (filename, len(substitutions)))
835 for line in src:
836 keep_line = True
837 for subst in _substitutions:
838 m = subst['lRE'].match(line)
839 if m:
840 logger.debug('match: '+line.rstrip())
841 if subst['repl'] is False:
842 keep_line = False
843 else:
844 line = subst['sRE'].sub(str(subst['repl']), line)
845 logger.debug('replaced: '+line.rstrip())
846 if keep_line:
847 target.write(line)
848 else:
849 logger.debug("Deleting line %r", line)
850
851 target.seek(0L)
852 with open(newname, 'w') as final:
853 shutil.copyfileobj(target, final)
854 target.close()
855 logger.info("edited txt = %(newname)r" % vars())
856
857
858
859
860
861
862
863 NDXLIST = re.compile(r""">\s+\n # '> ' marker line from '' input (input not echoed)
864 \n # empty line
865 (?P<LIST> # list of groups
866 ( # consists of repeats of the same pattern:
867 \s*\d+ # group number
868 \s+[^\s]+\s*: # group name, separator ':'
869 \s*\d+\satoms # number of atoms in group
870 \n
871 )+ # multiple repeats
872 )""", re.VERBOSE)
873
874
875 NDXGROUP = re.compile(r"""
876 \s*(?P<GROUPNUMBER>\d+) # group number
877 \s+(?P<GROUPNAME>[^\s]+)\s*: # group name, separator ':'
878 \s*(?P<NATOMS>\d+)\satoms # number of atoms in group
879 """, re.VERBOSE)
882 """make_ndx that captures all output
883
884 Standard :func:`~gromacs.make_ndx` command with the input and
885 output pre-set in such a way that it can be conveniently used for
886 :func:`parse_ndxlist`.
887
888 Example::
889 ndx_groups = parse_ndxlist(make_ndx_captured(n=ndx)[0])
890
891 Note that the convenient :func:`get_ndx_groups` function does exactly
892 that and can probably used in most cases.
893
894 :Arguments:
895 keywords are passed on to :func:`~gromacs.make_ndx`
896 :Returns:
897 (*returncode*, *output*, ``None``)
898 """
899 kwargs['stdout']=False
900 kwargs['stderr']=True
901 user_input = kwargs.pop('input',[])
902 user_input = [cmd for cmd in user_input if cmd != 'q']
903 kwargs['input'] = user_input + ['', 'q']
904 return gromacs.make_ndx(**kwargs)
905
907 """Return a list of index groups in the index file *ndx*.
908
909 :Arguments:
910 - *ndx* is a Gromacs index file.
911 - kwargs are passed to :func:`make_ndx_captured`.
912
913 :Returns:
914 list of groups as supplied by :func:`parse_ndxlist`
915
916 Alternatively, load the index file with
917 :class:`gromacs.formats.NDX` for full control.
918 """
919 fd, tmp_ndx = tempfile.mkstemp(suffix='.ndx')
920 kwargs['o'] = tmp_ndx
921 try:
922 g = parse_ndxlist(make_ndx_captured(n=ndx, **kwargs)[1])
923 finally:
924 utilities.unlink_gmx(tmp_ndx)
925 return g
926
928 """Parse output from make_ndx to build list of index groups::
929
930 groups = parse_ndxlist(output)
931
932 output should be the standard output from ``make_ndx``, e.g.::
933
934 rc,output,junk = gromacs.make_ndx(..., input=('', 'q'), stdout=False, stderr=True)
935
936 (or simply use
937
938 rc,output,junk = cbook.make_ndx_captured(...)
939
940 which presets input, stdout and stderr; of course input can be overriden.)
941
942 :Returns:
943 The function returns a list of dicts (``groups``) with fields
944
945 name
946 name of the groups
947 nr
948 number of the group (starts at 0)
949 natoms
950 number of atoms in the group
951
952 """
953
954 m = NDXLIST.search(output)
955 grouplist = m.group('LIST')
956 return parse_groups(grouplist)
957
959 """Parse ``make_ndx`` output and return groups as a list of dicts."""
960 groups = []
961 for line in output.split('\n'):
962 m = NDXGROUP.match(line)
963 if m:
964 d = m.groupdict()
965 groups.append({'name': d['GROUPNAME'],
966 'nr': int(d['GROUPNUMBER']),
967 'natoms': int(d['NATOMS'])})
968 return groups
969
971 """Build an index file with specified groups and the combined group.
972
973 This is *not* a full blown selection parser a la Charmm, VMD or
974 MDAnalysis but a very quick hack.
975
976 **Example**
977
978 How to use the :class:`IndexBuilder`::
979
980 G = gromacs.cbook.IndexBuilder('md_posres.pdb',
981 ['S312:OG','T313:OG1','A38:O','A309:O','@a62549 & r NA'],
982 offset=-9, out_ndx='selection.ndx')
983 groupname, ndx = G.combine()
984 del G
985
986 The residue numbers are given with their canonical resids from the
987 sequence or pdb. *offset=-9* says that one calculates Gromacs topology
988 resids by subtracting 9 from the canonical resid.
989
990 The combined selection is ``OR`` ed by default and written to
991 *selection.ndx*. One can also add all the groups in the initial *ndx*
992 file (or the :program:`make_ndx` default groups) to the output (see the
993 *defaultgroups* keyword for :meth:`IndexBuilder.combine`).
994
995 Generating an index file always requires calling
996 :meth:`~IndexBuilder.combine` even if there is only a single group.
997
998 Deleting the class removes all temporary files associated with it (see
999 :attr:`IndexBuilder.indexfiles`).
1000
1001 :Raises:
1002 If an empty group is detected (which does not always work) then a
1003 :exc:`gromacs.BadParameterWarning` is issued.
1004
1005 :Bugs:
1006 If ``make_ndx`` crashes with an unexpected error then this is fairly hard to
1007 diagnose. For instance, in certain cases it segmentation faults when a tpr
1008 is provided as a *struct* file and the resulting error messages becomes ::
1009
1010 GromacsError: [Errno -11] Gromacs tool failed
1011 Command invocation: make_ndx -o /tmp/tmp_Na1__NK7cT3.ndx -f md_posres.tpr
1012
1013 In this case run the command invocation manually to see what the problem
1014 could be.
1015
1016
1017 .. SeeAlso:: In some cases it might be more straightforward to use
1018 :class:`gromacs.formats.NDX`.
1019
1020 """
1021
1022 - def __init__(self, struct=None, selections=None, names=None, name_all=None,
1023 ndx=None, out_ndx="selection.ndx", offset=0):
1024 """Build a index group from the selection arguments.
1025
1026 If selections and a structure file are supplied then the individual
1027 selections are constructed with separate calls to
1028 :func:`gromacs.make_ndx`. Use :meth:`IndexBuilder.combine` to combine
1029 them into a joint selection.
1030
1031 :Arguments:
1032
1033 struct : filename
1034 Structure file (tpr, pdb, ...)
1035
1036 selections : list
1037 The list must contain strings or tuples, which must be be one of
1038 the following constructs:
1039
1040 "<1-letter aa code><resid>[:<atom name]"
1041
1042 Selects the CA of the residue or the specified atom
1043 name.
1044
1045 example: ``"S312:OA"`` or ``"A22"`` (equivalent to ``"A22:CA"``)
1046
1047 ("<1-letter aa code><resid>", "<1-letter aa code><resid>, ["<atom name>"])
1048
1049 Selects a *range* of residues. If only two residue
1050 identifiers are provided then all atoms are
1051 selected. With an optional third atom identifier,
1052 only this atom anme is selected for each residue
1053 in the range. [EXPERIMENTAL]
1054
1055 "@<make_ndx selection>"
1056
1057 The ``@`` letter introduces a verbatim ``make_ndx``
1058 command. It will apply the given selection without any
1059 further processing or checks.
1060
1061 example: ``"@a 6234 - 6238"`` or ``'@"SOL"'`` (note the quoting)
1062 or ``"@r SER & r 312 & t OA"``.
1063
1064 names : list
1065 Strings to name the selections; if not supplied or if individuals
1066 are ``None`` then a default name is created.
1067
1068 offset : int, dict
1069 This number is added to the resids in the first selection scheme; this
1070 allows names to be the same as in a crystal structure. If offset is a
1071 dict then it is used to directly look up the resids.
1072
1073 ndx : filename or list of filenames
1074 Optional input index file(s).
1075
1076 out_ndx : filename
1077 Output index file.
1078
1079 """
1080 self.structure = struct
1081 self.ndx = ndx
1082 self.output = out_ndx
1083 self.name_all = name_all
1084
1085
1086
1087
1088
1089
1090 self.offset = offset
1091
1092
1093 self.command_counter = 0
1094
1095 if selections is None:
1096 selections = []
1097 if not utilities.iterable(selections):
1098 selections = [selections]
1099 self.selections = selections
1100 if names is None:
1101 names = [None] * len(selections)
1102
1103
1104
1105 self.make_ndx = tools.Make_ndx(f=self.structure, n=self.ndx,
1106 stdout=False, stderr=False)
1107
1108
1109
1110
1111 self.indexfiles = dict([self.parse_selection(selection, name)
1112 for selection, name in zip(selections, names)])
1113 self.names = self.indexfiles.keys()
1114
1116 """Returns resid in the Gromacs index by transforming with offset."""
1117 try:
1118 gmx_resid = int(self.offset[resid])
1119 except (TypeError, IndexError):
1120 gmx_resid = resid + self.offset
1121 except KeyError:
1122 raise KeyError("offset must be a dict that contains the gmx resid for %d" % resid)
1123 return gmx_resid
1124
1125 - def combine(self, name_all=None, out_ndx=None, operation='|', defaultgroups=False):
1126 """Combine individual groups into a single one and write output.
1127
1128 :Keywords:
1129 name_all : string
1130 Name of the combined group, ``None`` generates a name. [``None``]
1131 out_ndx : filename
1132 Name of the output file that will contain the individual groups
1133 and the combined group. If ``None`` then default from the class
1134 constructor is used. [``None``]
1135 operation : character
1136 Logical operation that is used to generate the combined group from
1137 the individual groups: "|" (OR) or "&" (AND) ["|"]
1138 defaultgroups : bool
1139 ``True``: append everything to the default groups produced by
1140 :program:`make_ndx` (or rather, the groups provided in the ndx file on
1141 initialization --- if this was ``None`` then these are truly default groups);
1142 ``False``: only use the generated groups
1143
1144 :Returns:
1145 ``(combinedgroup_name, output_ndx)``, a tuple showing the
1146 actual group name and the name of the file; useful when all names are autogenerated.
1147
1148 .. Warning:: The order of the atom numbers in the combined group is
1149 *not* guaranteed to be the same as the selections on input because
1150 ``make_ndx`` sorts them ascending. Thus you should be careful when
1151 using these index files for calculations of angles and dihedrals.
1152 Use :class:`gromacs.formats.NDX` in these cases.
1153 """
1154 if name_all is None:
1155 name_all = self.name_all
1156 if name_all is None:
1157 name_all = operation.join(self.indexfiles)
1158 if not operation in ('|', '&'):
1159 raise ValueError("Illegal operation %r, only '|' (OR) and '&' (AND) allowed." %
1160 operation)
1161 if out_ndx is None:
1162 out_ndx = self.output
1163
1164 if defaultgroups:
1165
1166 fd, default_ndx = tempfile.mkstemp(suffix='.ndx', prefix='default__')
1167 try:
1168 self.make_ndx(o=default_ndx, input=['q'])
1169 except:
1170 utilities.unlink_gmx(default_ndx)
1171 raise
1172 ndxfiles = [default_ndx]
1173 else:
1174 ndxfiles = []
1175
1176 ndxfiles.extend(self.indexfiles.values())
1177
1178 if len(self.selections) == 1:
1179
1180 try:
1181 cmd = ['', 'q']
1182 rc,out,err = self.make_ndx(n=ndxfiles, o=out_ndx, input=cmd)
1183 if self._is_empty_group(out):
1184 warnings.warn("No atoms found for %(cmd)r" % vars(),
1185 category=BadParameterWarning)
1186 finally:
1187 if defaultgroups:
1188 utilities.unlink_gmx(default_ndx)
1189 else:
1190
1191 try:
1192 fd, tmp_ndx = tempfile.mkstemp(suffix='.ndx', prefix='combined__')
1193
1194 operation = ' '+operation.strip()+' '
1195 cmd = [operation.join(['"%s"' % gname for gname in self.indexfiles]),
1196 '', 'q']
1197 rc,out,err = self.make_ndx(n=ndxfiles, o=tmp_ndx, input=cmd)
1198 if self._is_empty_group(out):
1199 warnings.warn("No atoms found for %(cmd)r" % vars(),
1200 category=BadParameterWarning)
1201
1202
1203 groups = parse_ndxlist(out)
1204 last = groups[-1]
1205
1206 name_cmd = ["name %d %s" % (last['nr'], name_all),
1207 'q']
1208 rc,out,err = self.make_ndx(n=tmp_ndx, o=out_ndx, input=name_cmd)
1209
1210
1211
1212
1213 finally:
1214 utilities.unlink_gmx(tmp_ndx)
1215 if defaultgroups:
1216 utilities.unlink_gmx(default_ndx)
1217
1218 return name_all, out_ndx
1219
1220 - def cat(self, out_ndx=None):
1221 """Concatenate input index files.
1222
1223 Generate a new index file that contains the default Gromacs index
1224 groups (if a structure file was defined) and all index groups from the
1225 input index files.
1226
1227 :Arguments:
1228 out_ndx : filename
1229 Name of the output index file; if ``None`` then use the default
1230 provided to the constructore. [``None``].
1231 """
1232 if out_ndx is None:
1233 out_ndx = self.output
1234 self.make_ndx(o=out_ndx, input=['q'])
1235 return out_ndx
1236
1238 """Retuns (groupname, filename) with index group."""
1239
1240 if type(selection) is tuple:
1241
1242 process = self._process_range
1243 elif selection.startswith('@'):
1244
1245 process = self._process_command
1246 selection = selection[1:]
1247 else:
1248 process = self._process_residue
1249 return process(selection, name)
1250
1252 """Process ``make_ndx`` command and return name and temp index file."""
1253
1254 self.command_counter += 1
1255 if name is None:
1256 name = "CMD%03d" % self.command_counter
1257
1258
1259
1260 try:
1261 fd, tmp_ndx = tempfile.mkstemp(suffix='.ndx', prefix='tmp_'+name+'__')
1262 cmd = [command, '', 'q']
1263
1264 rc,out,err = self.make_ndx(o=tmp_ndx, input=cmd)
1265 self.check_output(out, "No atoms found for selection %(command)r." % vars(), err=err)
1266
1267
1268
1269
1270 groups = parse_ndxlist(out)
1271 last = groups[-1]
1272
1273 fd, ndx = tempfile.mkstemp(suffix='.ndx', prefix=name+'__')
1274 name_cmd = ["keep %d" % last['nr'],
1275 "name 0 %s" % name, 'q']
1276 rc,out,err = self.make_ndx(n=tmp_ndx, o=ndx, input=name_cmd)
1277 finally:
1278 utilities.unlink_gmx(tmp_ndx)
1279
1280 return name, ndx
1281
1282
1283 RESIDUE = re.compile("""
1284 (?P<aa>([ACDEFGHIKLMNPQRSTVWY]) # 1-letter amino acid
1285 | # or
1286 (\S\S\S) # 3-letter residue name
1287 )
1288 (?P<resid>\d+) # resid
1289 (: # separator ':'
1290 (?P<atom>\w+) # atom name
1291 )? # possibly one
1292 """, re.VERBOSE)
1293
1295 """Process residue/atom selection and return name and temp index file."""
1296
1297 if name is None:
1298 name = selection.replace(':', '_')
1299
1300
1301 m = self.RESIDUE.match(selection)
1302 if not m:
1303 raise ValueError("Selection %(selection)r is not valid." % vars())
1304
1305 gmx_resid = self.gmx_resid(int(m.group('resid')))
1306 residue = m.group('aa')
1307 if len(residue) == 1:
1308 gmx_resname = utilities.convert_aa_code(residue)
1309 else:
1310 gmx_resname = residue
1311 gmx_atomname = m.group('atom')
1312 if gmx_atomname is None:
1313 gmx_atomname = 'CA'
1314
1315
1316 _selection = 'r %(gmx_resid)d & r %(gmx_resname)s & a %(gmx_atomname)s' % vars()
1317 cmd = ['keep 0', 'del 0',
1318 _selection,
1319 'name 0 %(name)s' % vars(),
1320 'q']
1321 fd, ndx = tempfile.mkstemp(suffix='.ndx', prefix=name+'__')
1322 rc,out,err = self.make_ndx(n=self.ndx, o=ndx, input=cmd)
1323 self.check_output(out, "No atoms found for "
1324 "%(selection)r --> %(_selection)r" % vars())
1325
1326
1327
1328
1329 return name, ndx
1330
1332 """Process a range selection.
1333
1334 ("S234", "A300", "CA") --> selected all CA in this range
1335 ("S234", "A300") --> selected all atoms in this range
1336
1337 .. Note:: Ignores residue type, only cares about the resid (but still required)
1338 """
1339
1340 try:
1341 first, last, gmx_atomname = selection
1342 except ValueError:
1343 try:
1344 first, last = selection
1345 gmx_atomname = '*'
1346 except:
1347 logger.error("%r is not a valid range selection", selection)
1348 raise
1349 if name is None:
1350 name = "%(first)s-%(last)s_%(gmx_atomname)s" % vars()
1351
1352 _first = self._translate_residue(first, default_atomname=gmx_atomname)
1353 _last = self._translate_residue(last, default_atomname=gmx_atomname)
1354
1355 _selection = 'r %d - %d & & a %s' % (_first['resid'], _last['resid'], gmx_atomname)
1356 cmd = ['keep 0', 'del 0',
1357 _selection,
1358 'name 0 %(name)s' % vars(),
1359 'q']
1360 fd, ndx = tempfile.mkstemp(suffix='.ndx', prefix=name+'__')
1361 rc,out,err = self.make_ndx(n=self.ndx, o=ndx, input=cmd)
1362 self.check_output(out, "No atoms found for "
1363 "%(selection)r --> %(_selection)r" % vars())
1364
1365
1366
1367
1368 return name, ndx
1369
1370
1372 """Translate selection for a single res to make_ndx syntax."""
1373 m = self.RESIDUE.match(selection)
1374 if not m:
1375 errmsg = "Selection %(selection)r is not valid." % vars()
1376 logger.error(errmsg)
1377 raise ValueError(errmsg)
1378
1379 gmx_resid = self.gmx_resid(int(m.group('resid')))
1380 residue = m.group('aa')
1381 if len(residue) == 1:
1382 gmx_resname = utilities.convert_aa_code(residue)
1383 else:
1384 gmx_resname = residue
1385
1386 gmx_atomname = m.group('atom')
1387 if gmx_atomname is None:
1388 gmx_atomname = default_atomname
1389
1390 return {'resname':gmx_resname, 'resid':gmx_resid, 'atomname':gmx_atomname}
1391
1392
1393
1394 - def check_output(self, make_ndx_output, message=None, err=None):
1395 """Simple tests to flag problems with a ``make_ndx`` run."""
1396 if message is None:
1397 message = ""
1398 else:
1399 message = '\n' + message
1400 def format(output, w=60):
1401 hrule = "====[ GromacsError (diagnostic output) ]".ljust(w,"=")
1402 return hrule + '\n' + str(output) + hrule
1403
1404 rc = True
1405 if self._is_empty_group(make_ndx_output):
1406 warnings.warn("Selection produced empty group.%(message)s"
1407 % vars(), category=GromacsValueWarning)
1408 rc = False
1409 if self._has_syntax_error(make_ndx_output):
1410 rc = False
1411 out_formatted = format(make_ndx_output)
1412 raise GromacsError("make_ndx encountered a Syntax Error, "
1413 "%(message)s\noutput:\n%(out_formatted)s" % vars())
1414 if make_ndx_output.strip() == "":
1415 rc = False
1416 out_formatted = format(err)
1417 raise GromacsError("make_ndx produced no output, "
1418 "%(message)s\nerror output:\n%(out_formatted)s" % vars())
1419 return rc
1420
1422 m = re.search('Group is empty', make_ndx_output)
1423 return not (m is None)
1424
1426 m = re.search('Syntax error:', make_ndx_output)
1427 return not (m is None)
1428
1429
1441
1690