Setting up a protein for simulations - 2008

Phil Biggin philip.biggin@bioch.ox.ac.uk

Table of Contents.

  1. About this practical
  2. Practicalities
  3. Tute 1 - Let's go!
  4. Further Reading
  5. Appendix I - Basic Unix Commands
  6. Appendix II - Useful Software
  7. Acknowledgements

  1. About this Practical.
  2. This document should also be available online at http://sbcb.bioch.ox.ac.uk/phil/teaching/ccpb-prac.html.

    This practical is designed to demonstrate what sorts of questions and problems occur when setting up your protein for simulation. Most of the steps involved can be achieved by simple web pages. Very little assumption as to previous knowledge is made.

  3. Practicalities.
  4. In this practical session, it is assumed that the user is reasonably familiar with some basic Linux/Unix command line tools. See the Appendix I - Basic Unix Commands if you need some help in this respect. In the following the % symbol is used to indicate the command line prompt. Where indicated enter everything after this symbol. For visualization we will use the freely available molecular graphics package, VMD,

  5. Lets go!
  6. First create a directory in which we will put all the work from this session:-
    % mkdir setting-up
    
    Start the web-browser (firefox or whatever) and go to the protein data bank (www.rcsb.org). Enter the pdb code 1KYN in the site search box at the top and hit the site search button. The protein should come up. Select download from the left-hand column and save the file to the setting-up directory.

    When you have downloaded a pdb from the protein data bank its always a good idea to look at it both graphically and as a text file. The former will point out 3D issues immediately, whilst the latter will tell you what the crystallographers actually did and what they might mean for your simulation. Let us look at this protein using vmd:-

    % vmd 1KYN.pdb
    
    The first thing you should notice is that this protein is a dimer. The first question you should also ask yourself is
    "is that biologically relevant?". How do you know?

    Sometimes, but not always, the crystallographers will have determined the biologically relevant unit by other means. In which case you might see the following lines in the PDB file:-

    REMARK 350 BIOMOLECULE: 1                                                       
    REMARK 350 AUTHOR DETERMINED BIOLOGICAL UNIT: MONOMERIC                         
    REMARK 350 APPLY THE FOLLOWING TO CHAINS: A                                     
    REMARK 350   BIOMT1   1  1.000000  0.000000  0.000000        0.00000            
    REMARK 350   BIOMT2   1  0.000000  1.000000  0.000000        0.00000            
    REMARK 350   BIOMT3   1  0.000000  0.000000  1.000000        0.00000  
    
    You can see here that the monomer is the biologically relevant unit. Sometimes this information is not present. In which case you have to either rely on the literature or you can use the PQS server which tries to build the most likely quaternary struture for you.

    We can also use the graphics program to determine whether there are any non-protein groups (including ions, water of solvation and drug molecules) bound. In VMD select graphics-->representations which will bring up another dialogue box. Click on "Create Representation" and type "not protein" into the selected atoms box. Then change the drawing method to VDW. You should see that in each monomer there is a drug molecule bound. Parameterizing drug molecules is whole tutorial in itself, so for this practical we will remove it later.

    If you look carefully at this protein you will notice that there are no protons. That is because at most resolutions lower than 1.0 Angstrom, it is difficult to see the density for them.

    Finally let us take a look at residue 239. See if you can figure how to display residue 239 in CPK format.

    What do you notice about it?

    If you look at this file (using the graphical editor nedit for example):-

    % nedit 1KYN.pdb
    
    You can see that most of what we highlighted above is also described in the Header section of the PDB along with a whole lot more information including how the crystal was refined and the conditions for crystallization. You should also scroll down to residue 36.

    What do you notice in that region of the file?

    Most simulation packages will ignore this insertion code and simply renumber the residues starting from 1, so you should keep that in mind.

    In our case we want to simulate the monomer without any compounds in the binding site. The first issue is how to extract the monomer. There are a couple of ways to do this. The old fashioned way is to simply nedit the file and delete all ATOM records that are not applicable (Ie. chain B for example, but think about this - not all chains are equal and you may make things harder for yourself if you delete a chain that has missing backbone for example).

    An alternative is to use the www.rcsb.org site which has a download option that will give you the biologically relevant unit (where known). We will do the latter. Put in 1KYN into the PDB search query and then on the download options select the one called "Biological Unit 1 gz" and download it into your setting-up directory. Call it 1kyn-biounit.pdb. You should always of course check to see if it really has given you what you expected and that means checking with VMD again.

    We now have the correct biological unit, but we still have the ligand molecule present in this case. We need to get rid of that. The easiest way is to simply open the file with your favourite graphical editor and remove the HETATM lines.
    By this stage you should have a file with a monomer and no drug molecules bound. The next stage is to build in the missing atoms and add the protons. Fortunately for you guys this has all been rather well automated by the What-If server:-

    http://swift.cmbi.ru.nl/servers/html/index.html

    We will use the server that builds/checks/repairs pdbs. Click on "Build/Check/Repair model" and then select "Prepare PDB file for docking programs".

    Upload your recently prepared apo biologically relevant structure and submit the request. It should take less than 1 minute to process and then you should click on done and download the resulting prepdock.pdb file.

    Open this file in VMD and examine that 239 residue (type resid 239 into the selection box) that was previously not built by the crystallographers.

    Does the rotamer position look sensible?

    Also examine HIS57 (resid 57 in vmd selection).

    What do you notice about this Histidine compared to the other Histidine residues in the structure? Does it appear sensible?

    You should now have a protein structure that has all hydrogens added and is ready to proceed to the next stage of the practical. If you finished early then pick one of the following proteins and summarize any potential problems/errors associated with those pdb files:-

    1VRH
    1AQW
    1DG5
    1EI1
    1HFC
    1LCP
    1LMO
    1OKL
    6ABP
    1MTW

    That concludes this practical. I welcome all comments (including negative ones) and/or suggestions - please feel free to email philip.biggin@bioch.ox.ac.uk.

  7. Further Reading.
  8. PDB Checkers

    Gerard Kleywegt has an excellent webpage on pdb errors:-

    pKa calculations

    Some excellent simulation references

  9. Appendix I - Basic Unix/Linux Commands.
  10. ls -lrtprovides a "long" list of all files in the current directory in reverse order of time.
    cd dir change directory to the directory 'dir'
    pwd print the current working directory on the screen
    rm file delete (remove) 'file'
    mv file newfile rename file to newfile
    cat file print the contents of file to the screen
    more file print the contents of file to the screen but with more navigation possible.

  11. Appendix II - Useful Packages.
  12. Acknowledgements.
  13. I also thank the Wellcome Trust and the Oxford Supercomputing Centre.