Setting up a protein for simulations - 2008

Phil Biggin philip.biggin@bioch.ox.ac.uk

Table of Contents.

About this practical
Practicalities
Tute 1 - Let's go!
Further Reading
Appendix I - Basic Unix Commands
Appendix II - Useful Software
Acknowledgements

About this Practical.

http://sbcb.bioch.ox.ac.uk/phil/teaching/ccpb-prac.html

This practical is designed to demonstrate what sorts of questions and problems occur when setting up your protein for simulation. Most of the steps involved can be achieved by simple web pages. Very little assumption as to previous knowledge is made.

Practicalities.

Appendix I - Basic Unix Commands

VMD

Lets go!

% mkdir setting-up

When you have downloaded a pdb from the protein data bank its always a good idea to look at it both graphically and as a text file. The former will point out 3D issues immediately, whilst the latter will tell you what the crystallographers actually did and what they might mean for your simulation. Let us look at this protein using vmd:-

% vmd 1KYN.pdb

"is that biologically relevant?". How do you know?

Sometimes, but not always, the crystallographers will have determined the biologically relevant unit by other means. In which case you might see the following lines in the PDB file:-

REMARK 350 BIOMOLECULE: 1                                                       
REMARK 350 AUTHOR DETERMINED BIOLOGICAL UNIT: MONOMERIC                         
REMARK 350 APPLY THE FOLLOWING TO CHAINS: A                                     
REMARK 350   BIOMT1   1  1.000000  0.000000  0.000000        0.00000            
REMARK 350   BIOMT2   1  0.000000  1.000000  0.000000        0.00000            
REMARK 350   BIOMT3   1  0.000000  0.000000  1.000000        0.00000

We can also use the graphics program to determine whether there are any non-protein groups (including ions, water of solvation and drug molecules) bound. In VMD select graphics-->representations which will bring up another dialogue box. Click on "Create Representation" and type "not protein" into the selected atoms box. Then change the drawing method to VDW. You should see that in each monomer there is a drug molecule bound. Parameterizing drug molecules is whole tutorial in itself, so for this practical we will remove it later.

If you look carefully at this protein you will notice that there are no protons. That is because at most resolutions lower than 1.0 Angstrom, it is difficult to see the density for them.

Finally let us take a look at residue 239. See if you can figure how to display residue 239 in CPK format.

What do you notice about it?

% nedit 1KYN.pdb

What do you notice in that region of the file?

Most simulation packages will ignore this insertion code and simply renumber the residues starting from 1, so you should keep that in mind.

In our case we want to simulate the monomer without any compounds in the binding site. The first issue is how to extract the monomer. There are a couple of ways to do this. The old fashioned way is to simply nedit the file and delete all ATOM records that are not applicable (Ie. chain B for example, but think about this - not all chains are equal and you may make things harder for yourself if you delete a chain that has missing backbone for example).

An alternative is to use the www.rcsb.org site which has a download option that will give you the biologically relevant unit (where known). We will do the latter. Put in 1KYN into the PDB search query and then on the download options select the one called "Biological Unit 1 gz" and download it into your setting-up directory. Call it 1kyn-biounit.pdb. You should always of course check to see if it really has given you what you expected and that means checking with VMD again.

We now have the correct biological unit, but we still have the ligand molecule present in this case. We need to get rid of that. The easiest way is to simply open the file with your favourite graphical editor and remove the HETATM lines.
By this stage you should have a file with a monomer and no drug molecules bound. The next stage is to build in the missing atoms and add the protons. Fortunately for you guys this has all been rather well automated by the What-If server:-

http://swift.cmbi.ru.nl/servers/html/index.html

We will use the server that builds/checks/repairs pdbs. Click on "Build/Check/Repair model" and then select "Prepare PDB file for docking programs".

Upload your recently prepared apo biologically relevant structure and submit the request. It should take less than 1 minute to process and then you should click on done and download the resulting prepdock.pdb file.

Open this file in VMD and examine that 239 residue (type resid 239 into the selection box) that was previously not built by the crystallographers.

Does the rotamer position look sensible?

Also examine HIS57 (resid 57 in vmd selection).

What do you notice about this Histidine compared to the other Histidine residues in the structure? Does it appear sensible?

You should now have a protein structure that has all hydrogens added and is ready to proceed to the next stage of the practical. If you finished early then pick one of the following proteins and summarize any potential problems/errors associated with those pdb files:-

1VRH
1AQW
1DG5
1EI1
1HFC
1LCP
1LMO
1OKL
6ABP
1MTW

That concludes this practical. I welcome all comments (including negative ones) and/or suggestions - please feel free to email philip.biggin@bioch.ox.ac.uk.

Appendix I - Basic Unix/Linux Commands.

ls -lrt	provides a "long" list of all files in the current directory in reverse order of time.
cd dir	change directory to the directory 'dir'
pwd	print the current working directory on the screen
rm file	delete (remove) 'file'
mv file newfile	rename file to newfile
cat file	print the contents of file to the screen
more file	print the contents of file to the screen but with more navigation possible.

Appendix II - Useful Packages.

Rasmol The simplest and fastest way to look at a structure (but no tra jectories).
VMD The best free tool for looking at dynamics
Pymol Another opensource project. Has good graphics
MolMol Designed from an NMR point of view, and has some nice nove l ways of displaying information (eg. sausage plots)
swiss-pdb viewer Another free viewer
gromacs The only GPL molecular simulation package.
xmgrace The free data plotting program.