Phil Biggin philip.biggin@bioch.ox.ac.uk
This practical is designed to demonstrate what sorts of questions and problems occur when setting up your protein for simulation. Most of the steps involved can be achieved by simple web pages. Very little assumption as to previous knowledge is made.
% mkdir setting-upStart the web-browser (firefox or whatever) and go to the protein data bank (www.rcsb.org). Enter the pdb code 1KYN in the site search box at the top and hit the site search button. The protein should come up. Select download from the left-hand column and save the file to the setting-up directory.
When you have downloaded a pdb from the protein data bank its always a good idea to look at it both graphically and as a text file. The former will point out 3D issues immediately, whilst the latter will tell you what the crystallographers actually did and what they might mean for your simulation. Let us look at this protein using vmd:-
% vmd 1KYN.pdbThe first thing you should notice is that this protein is a dimer. The first question you should also ask yourself is
Sometimes, but not always, the crystallographers will have determined the biologically relevant unit by other means. In which case you might see the following lines in the PDB file:-
REMARK 350 BIOMOLECULE: 1 REMARK 350 AUTHOR DETERMINED BIOLOGICAL UNIT: MONOMERIC REMARK 350 APPLY THE FOLLOWING TO CHAINS: A REMARK 350 BIOMT1 1 1.000000 0.000000 0.000000 0.00000 REMARK 350 BIOMT2 1 0.000000 1.000000 0.000000 0.00000 REMARK 350 BIOMT3 1 0.000000 0.000000 1.000000 0.00000You can see here that the monomer is the biologically relevant unit. Sometimes this information is not present. In which case you have to either rely on the literature or you can use the PQS server which tries to build the most likely quaternary struture for you.
We can also use the graphics program to determine whether there are any non-protein groups (including ions, water of solvation and drug molecules) bound. In VMD select graphics-->representations which will bring up another dialogue box. Click on "Create Representation" and type "not protein" into the selected atoms box. Then change the drawing method to VDW. You should see that in each monomer there is a drug molecule bound. Parameterizing drug molecules is whole tutorial in itself, so for this practical we will remove it later.
If you look carefully at this protein you will notice that there are no protons. That is because at most resolutions lower than 1.0 Angstrom, it is difficult to see the density for them.
Finally let us take a look at residue 239. See if you can figure how to display residue 239 in CPK format.
% nedit 1KYN.pdbYou can see that most of what we highlighted above is also described in the Header section of the PDB along with a whole lot more information including how the crystal was refined and the conditions for crystallization. You should also scroll down to residue 36.
Most simulation packages will ignore this insertion code and simply renumber the residues starting from 1, so you should keep that in mind.
In our case we want to simulate the monomer without any compounds in the binding site. The first issue is how to extract the monomer. There are a couple of ways to do this. The old fashioned way is to simply nedit the file and delete all ATOM records that are not applicable (Ie. chain B for example, but think about this - not all chains are equal and you may make things harder for yourself if you delete a chain that has missing backbone for example).
An alternative is to use the www.rcsb.org site which has a download option that will give you the biologically relevant unit (where known). We will do the latter. Put in 1KYN into the PDB search query and then on the download options select the one called "Biological Unit 1 gz" and download it into your setting-up directory. Call it 1kyn-biounit.pdb. You should always of course check to see if it really has given you what you expected and that means checking with VMD again.
We now have the correct biological unit, but we still have the ligand molecule present in this case. We need to get rid of that.
The easiest way is to simply open the file with your favourite graphical editor and remove the HETATM lines.
By this stage you should have a file with a monomer and no drug molecules bound. The next stage is to build in the missing atoms and add the protons. Fortunately for you guys this has all been rather well automated by the What-If server:-
http://swift.cmbi.ru.nl/servers/html/index.html
We will use the server that builds/checks/repairs pdbs. Click on "Build/Check/Repair model" and then select "Prepare PDB file for docking programs".
Upload your recently prepared apo biologically relevant structure and submit the request. It should take less than 1 minute to process and then you should click on done and download the resulting prepdock.pdb file.
Open this file in VMD and examine that 239 residue (type resid 239 into the selection box) that was previously not built by the crystallographers.
Also examine HIS57 (resid 57 in vmd selection).
You should now have a protein structure that has all hydrogens added and is ready to proceed to the next stage of the practical. If you finished early then pick one of the following proteins and summarize any potential problems/errors associated with those pdb files:-
1VRH
1AQW
1DG5
1EI1
1HFC
1LCP
1LMO
1OKL
6ABP
1MTW
That concludes this practical. I welcome all comments (including negative ones) and/or suggestions - please feel free to email philip.biggin@bioch.ox.ac.uk.
PDB Checkers
Gerard Kleywegt has an excellent webpage on pdb errors:-
pKa calculations
Some excellent simulation references
ls -lrt | provides a "long" list of all files in the current directory in reverse order of time. |
cd dir | change directory to the directory 'dir' |
pwd | print the current working directory on the screen |
rm file | delete (remove) 'file' |
mv file newfile | rename file to newfile |
cat file | print the contents of file to the screen |
more file | print the contents of file to the screen but with more navigation possible. |