Phil Biggin philip.biggin@bioch.ox.ac.uk
This practical is designed to demonstrate what sorts of questions and problems occur when simulating proteins. It is divided into two parts. The first is designed to show you just how easy it is to run a protein simulation and perform some basic analysis. The second involves a little more exploration into some typical problems. As this practical course stems from the Wellcome Trust Structural Biology programme, students come from an extremely wide and varied range of backgrounds, and I therefore make very little assumption as to previous knowledge. The practical is both open-ended and in the analysis part is fairly independent of previous steps which means you can opt out of certain exercises if you wish. However I would recommend proceeding sequentially through all of it. The instructions get progressively less detailed as this would result in a very lengthy document, but if you get stuck simply ask. Finally, I assume very little knowledge of gromacs and VMD so if you are familiar with those packages already you will find things very easy.
If you are worried you have done something wrong or are not sure what the actual result should look like, you can look at the results directory within each section which has example output.
We will be using the molecular dynamics simulation package GROMACS. Extensive help is available online there both in html and as complete downloadable pdfs of the manual. The manual is worth downloading if you wish to understand some of the implementations of the algorithms and if you need further background into the subject of molecular dynamics.
NOTE: This trajectories and the practical are all based on using gromacs version 5.1.4 If you are using an older version then some things probably will not work. The VMD "directions" are based on version 1.9.1.
First we need to obtain all the files for the practical. Depending on where you are performing this practical they might have already been installed for you. If so then there should be a directory called wt-prac somewhere and you can proceed to the step below that asks you to check the gromacs installation. If you do not have the files installed then these can be retrieved with your browser from:
http://sbcb.bioch.ox.ac.uk/phil/teaching/wt-prac-2018-12-10.tar.gz
Now, let us unpack the practical archive in your home directory:-
% cd % mv Downloads/wt-prac-2018-12-10.tar.gz ./ [or if you saved it to another location move it from there] % tar xvfz wt-prac-2018-12-10.tar.gzThis should then expand various directories and files for use in the practical. Specifically it will create a directory called wt-prac. You should change directory to this directory and work from there as the commands that follow assume you are starting from that location.
% cd wt-prac
Next, make sure you have the GROMACS environment setup.
% source /usr/local/gromacs/bin/GMXRC % gmx grompp -hshould display some running information for the grompp program. If it does not contact a practical demonstrator. The -h option applies to all gromacs programs, so if you want to more about a certain program this will tell you about which options are valid. Indeed, the practical below is really only a route to show you what sorts of things are possible. You should explore the options available at each stage by giving the -h option.
If you look at this file (using the graphical editor nedit for example):-
% cd setupIf you are not familiar with VMD, now is a good time to fire it up and get used to performing some routine manipulations. You should immediately see that it has two chains:- (the following file name might be lowercase depending on how you downloaded it).
%vmd 1HSG.pdb
If the internet is down or you cannot find 1HSG then you can use a pre-downloded copy of that which is to be found in the data directory. In fact, most of the files you need can be found in this directory so if you get confused/stuck/lost etc then you can always look here to check you did things correctly.
You should experiment with the menu system and try various representations of the protein such as trace, cartoon and ribbons for example.
Exit vmd, either by clicking on File-->Quit or typing quit in the terminal box. Now we need to prepare our protein for simulation. First of all we will extract only the coordinates from the pdb file. To do this enter the following command:-
% grep ATOM 1HSG.pdb > protein.pdbFirst of all we need to make sure that all the hydrogens are added to our protein. This process will also generate the parameter/topology file we need.
% gmx pdb2gmx -f protein.pdb -ignh -o protein_gmx.pdbThe program should run and present a list of force-fields from which to select. Select the GROMOS96 43A1 force field which should be option 9 in the list followed by 1 to select SPC water. If all goes well this should generate several files: 1. topol.top 2. topol_Protein_chain_A.itp 3. topol_Protein_chain_B.itp 3. protein_gmx.pdb and 4. posre_Protein_chain_A.itp. 5. posre_Protein_chain_B.itp Note that the protein has a net charge of +4e. You should see a line that says "Total charge in system 4.000 e".
Now we have hydrogens we can also ask questions about hydrogen bonding. Use vmd again and...
exit vmd
Before we can add water we need to define a box in which to put the protein and the water:-
% gmx editconf -f protein_gmx.pdb -box 7 7 7 -c -o boxed.pdbThis puts the protein in the centre of the box that is 7 nm x 7 nm x 7 nm and creates the resulting file boxed.pdb. It is always a good idea to check that this operation produces the desired result.
% vmd boxed.pdb -e ../data/pbcbox.tcl
Next we need to add water to the system. We will also add ions to the system - enough to neutralize the system and to a concentration that is representative of the cell. Now we can add the water by repeatedly overlaying a small (216 molecules) box of water into the system.
% gmx solvate -cp boxed.pdb -cs -o solvated.pdb -p topol.top
You may have noticed in some of the output generated that the total system charge is +4. In order for us to use an Ewald method to calculate the electrostatic interactions we need to have a neutral system overall. Therefore we will add counterions (chloride ions, in this case) using the option -neutral and enough ions to make the solution up to 150 mM (-conc 0.15). This is done by replacing random water molecules (SOL) with NA+ or CL- ions.
% gmx grompp -c solvated.pdb -p topol.top -f ../data/genion.mdp -o genion.tpr % gmx genion -s genion.tpr -conc 0.15 -neutral -pname NA -nname CL -o system.gro -p topol.topWhen prompted, enter the group that corresponds to SOL (should be 13 or thereabouts).
Visualise the system using vmd and convince yourself that the system has been solvated and ions have been placed at random locations
% vmd system.gro
Before we can run the actual dynamics, we need to first minimize the system. Ideally you would minimize down until the forces were below a certain level (tolerance), but we will just give a quick burst of 200 steps here.
% cd ../run % gmx grompp -c ../setup/system.gro -p ../setup/topol.top -f ../data/em.mdp -o em.tpr % gmx mdrun -deffnm em -vWe can examine the minimization in terms of the potential energy versus the number of minimization steps:
% gmx energy -s em.tpr -f em.edr -o em_potential_energy.xvgtype in 10 when prompted (should correspond to potential energy from the list of options presented), then a zero to finish. Next we can draw the graph/plot using xmgrace:-
% xmgrace em_potential_energy.xvg
At this stage, we would normally run a short simulation where the protein atoms are restrained while the water molecules and ions are allowed to freely move around and equilibrate around the protein. For this tutorial, we will skip this bit due to limited time. Now finally let us perform some molecular dynamics:-
% gmx grompp -c em.gro -p ../setup/topol.top -f ../data/md.mdp -maxwarn 1 -o md.tpr % gmx mdrun -deffnm md &At the moment it is set up to run for 100 ps. This will take at least 30 minutes to complete depending on the speed of your system - time for lunch!. You don't have to wait for it to finish completely though, although now might be a good time for a break to allow at least some data to appear. The analysis can be done on the files as they appear or you can always use the "one I made earlier" in the directory prerun/run (This is 1000 ps simulation of the same system).
% vmdWhen it has finished placing all the windows on the screen. Click on "File" in the VMD main menu window and select "new molecule". The Molecule File Browser window should appear. Click on "browse" then select setup and finally select em.gro (ie the file you made that has protein system energy minimized). Click "OK" and then click "Load". It should load up the starting coordinates into the main window. Then click browse in Molecule File Browser window. Select md.xtc. Select "OK" and then hit "load". Alternatively locate the longer pre-run xtc file made in the backup directory and load that. The trajectory should start loading into the main VMD window. Although things will be moving, you can see that its quite difficult to visualize the individual components. That is one of the problems with simulating such large a complicated systems. VMD makes it quite easy to look at individual components of a system. For example, let us consider the protein only. On the main menu, left click on graphics and select "representations". A new menu will appear (Graphical Representations). In the box entitled "selected atoms" type protein and hit enter. Only those atoms that form part of the protein are now selected. Various other selections and drawing methods will help to visualize different aspects of the simulation. More help on the selections is available here.
Now that we are sure the simulation is not doing anything ridiculous, we can start to ask questions about the simulation. The first thing to establish is whether the simulation has equilibrated to some state. So what are some measures of the system being equilibrated? and what can we use to test the reliability of the simulation?
% vmd -f ../prerun/run/em.gro ../prerun/run/md.xtc &
% cd ../analysis % gmx energy -f ../prerun/run/md.edr -s ../prerun/run/md.tpr -o 1hsg_temperature.xvgThe program will then present you with a large table of all the values recorded in the energy (.edr) file. We want to examine temperature so type 12, press enter and then 0 and press enter again. The program will then analyse the temperature and present some statistics of the analysis at the end. We can look at the fluctuations in more detail by using xmgrace (or xmgr).
% xmgrace 1hsg_temperature.xvgwill display the temperature fluctuations.
Another set of properties that is quite useful to examine is the various energetic contributions to the energy. The total energy should be constant. but the various contributions can change and this can sometimes indicate something interesting or strange happening in your simulation. Let us look at some energetic properties of the simulation.
% gmx energy -s ../prerun/run/md.tpr -f ../prerun/run/md.edr -o 1hsg_energies.xvgWe shall select short-range lennard jones (6), short range coloumbic (7) and the potential energy (9). End your selection with a zero. (Note: If you are unfamiliar with xmgrace you may find it easier to make these as three separate files instead of saving everything in 1hsg_energies.xvg.) Look at the resulting graph with xmgrace again.
% xmgrace -nxy 1hsg_energies.xvg
Now that we are happy that the system is OK, let us now consider the root mean square deviation (RMSD). We shall use the tool g_rms
% gmx rms -s ../prerun/run/em.gro -f ../prerun/run/md.xtc -o 1hsg_rms.xvg
We will use the Cα coordinates to do the fitting to, so select group 3 when prompted. Then enter 3, to look at the RMSD of all Cα atoms. Have a look at the resulting graphs with xmgrace.
% gmx rmsf -s ../prerun/run/em.gro -f ../prerun/run/md.xtc -b 200 -oq 1hsg_bfac.pdb -o 1hsg_rmsf.xvgselect 3 when prompted to obtain b-factors for Cα atoms. NOTE: This will omit the first 200 ps (where the simulation is less stable). For this part you'll probably want to use the pre-run data. This will produce a pdb file (1hsg_bfac.pdb) that has b-factors in the 10th column. We can extract this data ready for plotting in xmgrace as followings:-
% grep CA 1hsg_bfac.pdb | awk '{print $5, $10}' > 1hsg_bfac_ca.datLook at this data in xmgrace.
We can compare this simulation data directly with experimental b-factors from the crystal structure of protease. Read in the following ascii data file into xmgrace:-
../data/1hsg_bfac_ca_expt.dat
% mkdir salt-bridge % cd salt-bridge % gmx saltbr -f ../../prerun/run/md.xtc -s ../../prerun/run/md.tpr -t 4.0 -sepThis will generate a lot of files. They are labelled according to residues making the salt-bridge. Identify the ones that correspond to the ones you observed in the crystal between aspartates and arginines and examine the graphs using xmgrace.
Some things to think about:-
In the last part of the practical you should explore other types of analysis. Its worth looking at some of the other standard gromacs analysis tools. If you type g_ and then press tab it should list all the gromacs analaysis tools. Typing any of these program names followed by the -h flag will tell you what that program does and that should help you decide whether its the tool for the job.
As a follow up, how might you calculate/show how water molecules behave in the vicinity of the protein? Are they always tightly bound? etc.. etc..
That concludes this practical. I welcome all comments (including negative ones) and/or suggestions - please feel free to email philip.biggin@bioch.ox.ac.uk.
The texts recommended here are the same as those mentioned in the lecture:-
ls -lrt | provides a "long" list of all files in the current directory in reverse order of time. |
cd dir | change directory to the directory 'dir' |
pwd | print the current working directory on the screen |
rm file | delete (remove) 'file' |
mv file newfile | rename file to newfile |
cat file | print the contents of file to the screen |
more file | print the contents of file to the screen but with more navigation possible. |
apropos arguement | tells everything that relates to arguement |
man command | Pulls up the manual page for command |
All of these manipulations can be accomplished very easily with the trjconv program. For example
gmx trjconv -f file.xtc -s file.tpr -b 5000 -o file_last_5ns_only.xtcwill write out the last five nanoseconds from this 10ns trajectory.
To write out less frequently, use the skip option. For example to write out the 10ns trajectory but
every 20ps instead of every 5:
gmx trjconv -f file.xtc -s file.tpr -o file_less_sample.xtc -skip 4This option is probably the best one as it gives you visualization across the whole trajectory (although obviously you should be aware that you might miss some very short timescale motions).
The program will prompt you selections so writing out just protein for example is very straightforward.
If your selection is not there, then you should include an index file. You can add your selection to the index file, by using the make_ndx program.
I also thank the Wellcome Trust and the Oxford Supercomputing Centre.