gromacs.formats – Accessing various files

This module contains classes that represent data files on disk. Typically one creates an instance and

  • reads from a file using a read() method, or
  • populates the instance (in the simplest case with a set() method) and the uses the write() method to write the data to disk in the appropriate format.

For function data there typically also exists a plot() method which produces a graph (using matplotlib).

The module defines some classes that are used in other modules; they do not make use of gromacs.tools or gromacs.cbook and can be safely imported at any time.

Classes

class gromacs.formats.XVG(filename=None, names=None, permissive=False, **kwargs)

Class that represents the numerical data in a grace xvg file.

All data must be numerical. NAN and INF values are supported via python’s float() builtin function.

The array attribute can be used to access the the array once it has been read and parsed. The ma attribute is a numpy masked array (good for plotting).

Conceptually, the file on disk and the XVG instance are considered the same data. Whenever the filename for I/O (XVG.read() and XVG.write()) is changed then the filename associated with the instance is also changed to reflect the association between file and instance.

With the permissive = True flag one can instruct the file reader to skip unparseable lines. In this case the line numbers of the skipped lines are stored in XVG.corrupted_lineno.

A number of attributes are defined to give quick access to simple statistics such as

  • mean: mean of all data columns
  • std: standard deviation
  • min: minimum of data
  • max: maximum of data
  • error: error on the mean, taking correlation times into account (see also XVG.set_correlparameters())
  • tc: correlation time of the data (assuming a simple exponential decay of the fluctuations around the mean)

These attributes are numpy arrays that correspond to the data columns, i.e. :attr:`XVG.array`[1:].

Note

  • Only simple XY or NXY files are currently supported, not Grace files that contain multiple data sets separated by ‘&’.
  • Any kind of formatting (i.e. xmgrace commands) is discarded.

Initialize the class from a xvg file.

Arguments:
filename

is the xvg file; it can only be of type XY or NXY. If it is supplied then it is read and parsed when XVG.array is accessed.

names

optional labels for the columns (currently only written as comments to file); string with columns separated by commas or a list of strings

permissive

False raises a ValueError and logs and errior when encountering data lines that it cannot parse. True ignores those lines and logs a warning—this is a risk because it might read a corrupted input file [False]

stride

Only read every stride line of data [1].

savedata

True includes the data (XVG.array` and associated caches) when the instance is pickled (see pickle); this is oftens not desirable because the data are already on disk (the xvg file filename) and the resulting pickle file can become very big. False omits those data from a pickle. [False]

array

Represent xvg data as a (cached) numpy array.

The array is returned with column-first indexing, i.e. for a data file with columns X Y1 Y2 Y3 ... the array a will be a[0] = X, a[1] = Y1, ... .

default_extension
Default extension of XVG files.
error

Error on the mean of the data, taking the correlation time into account.

See [FrenkelSmit2002] p526:

error = sqrt(2*tc*acf[0]/T)

where acf() is the autocorrelation function of the fluctuations around the mean, y-<y>, tc is the correlation time, and T the total length of the simulation.

[FrenkelSmit2002]D. Frenkel and B. Smit, Understanding Molecular Simulation. Academic Press, San Diego 2002
errorbar(**kwargs)

Quick hack: errorbar plot.

Set columns keyword to select [x, y, dy] or [x, y, dx, dy], e.g. columns=[0,1,2]. See XVG.plot() for details.

ma

Represent data as a masked array.

The array is returned with column-first indexing, i.e. for a data file with columns X Y1 Y2 Y3 ... the array a will be a[0] = X, a[1] = Y1, ... .

inf and nan are filtered via numpy.isfinite().

max
Maximum of the data columns.
mean
Mean value of all data columns.
min
Minimum of the data columns.
parse(stride=None)

Read and cache the file as a numpy array.

Store every stride line of data; if None then the class default is used.

The array is returned with column-first indexing, i.e. for a data file with columns X Y1 Y2 Y3 ... the array a will be a[0] = X, a[1] = Y1, ... .

plot(**kwargs)

Plot xvg file data.

The first column of the data is always taken as the abscissa X. Additional columns are plotted as ordinates Y1, Y2, ...

In the special case that there is only a single column then this column is plotted against the index, i.e. (N, Y).

Keywords:
columns : list

Select the columns of the data to be plotted; the list is used as a numpy.array extended slice. The default is to use all columns. Columns are selected after a transform.

transform : function

function transform(array) -> array which transforms the original array; must return a 2D numpy array of shape [X, Y1, Y2, ...] where X, Y1, ... are column vectors. By default the transformation is the identity [lambda x: x].

maxpoints : int

limit the total number of data points; matplotlib has issues processing png files with >100,000 points and pdfs take forever to display. Set to None if really all data should be displayed. At the moment we simply subsample the data at regular intervals. [10000]

kwargs

All other keyword arguments are passed on to pylab.plot().

read(filename=None)
Read and parse xvg file filename.
set(a)

Set the array data from a (i.e. completely replace).

No sanity checks at the moment...

set_correlparameters(**kwargs)

Set and change the parameters for calculations with correlation functions.

The parameters persist until explicitly changed.

Keywords:
nstep

only process every nstep data point to speed up the FFT; if left empty a default is chosen that produces roughly 25,000 data points (or whatever is set in ncorrel)

ncorrel

If no nstep is supplied, aim at using ncorrel data points for the FFT; sets XVG.ncorrel [25000]

force

force recalculating correlation data even if cached values are available

kwargs

see numkit.timeseries.tcorrel() for other options

std
Standard deviation from the mean of all data columns.
tc

Correlation time of the data.

See XVG.error() for details.

write(filename=None)

Write array to xvg file filename in NXY format.

Note

Only plain files working at the moment, not compressed.

class gromacs.formats.NDX(filename=None, **kwargs)

Gromacs index file.

Represented as a ordered dict where the keys are index group names and values are numpy arrays of atom numbers.

Use the NDX.read() and NDX.write() methods for I/O. Access groups by name via the NDX.get() and NDX.set() methods.

Alternatively, simply treat the NDX instance as a dictionary. Setting a key automatically transforms the new value into a integer 1D numpy array (not a set, as would be the make_ndx behaviour).

Note

The index entries themselves are ordered and can contain duplicates so that output from NDX can be easily used for g_dih and friends. If you need set-like behaviour you will have do use gromacs.formats.uniqueNDX or gromacs.cbook.IndexBuilder (which uses make_ndx throughout).

Example

Read index file, make new group and write to disk:

ndx = NDX()
ndx.read('system.ndx')
print ndx['Protein']       
ndx['my_group'] = [2, 4, 1, 5]   # add new group
ndx.write('new.ndx')

Or quicker (replacing the input file system.ndx):

ndx = NDX('system')          # suffix .ndx is automatically added
ndx['chi1'] = [2, 7, 8, 10]
ndx.write()
format
standard ndx file format: ‘%6d’
get(name)
Return index array for index group name.
groups
Return a list of all groups.
ncol
standard ndx file format: 15 columns
ndxlist

Return a list of groups in the same format as gromacs.cbook.get_ndx_groups().

Format:
[ {‘name’: group_name, ‘natoms’: number_atoms, ‘nr’: # group_number}, ....]
read(filename=None)
Read and parse index file filename.
set(name, value)
Set or add group name as a 1D numpy array.
size(name)
Return number of entries for group name.
sizes
Return a dict with group names and number of entries,
write(filename=None, ncol=15, format='%6d')
Write index file to filename (or overwrite the file that the index was read from)
class gromacs.formats.uniqueNDX(filename=None, **kwargs)

Index that behaves like make_ndx, i.e. entries behaves as sets, not lists.

The index lists behave like sets: - adding sets with ‘+’ is equivalent to a logical OR: x + y == “x | y” - subtraction ‘-‘ is AND: x - y == “x & y” - see join() for ORing multiple groups (x+y+z+...)

Example ::
I = uniqueNDX(‘system.ndx’) I[‘SOLVENT’] = I[‘SOL’] + I[‘NA+’] + I[‘CL-‘]
join(*groupnames)

Return an index group that contains atoms from all groupnames.

The method will silently ignore any groups that are not in the index.

Example

Always make a solvent group from water and ions, even if not all ions are present in all simulations:

I['SOLVENT'] = I.join('SOL', 'NA+', 'K+', 'CL-')        
class gromacs.formats.GRO(**kwargs)

Class that represents a GROMOS (gro) structure file.

File format:

(Not implemented yet)

read(filename=None)
Read and parse index file filename.

Table Of Contents

Previous topic

gromacs.config – Configuration for GromacsWrapper

Next topic

gromacs.utilities – Helper functions and classes

This Page