WaterDock is a fast and accurate method that predicts the location of water molecules in protein structures. The method's simplicity is the key to its speed; water molecules are docked into a structure using AutoDock Vina, low scoring sites are removed and the rest are clustered. The centroids of the clusters are the predicted water sites.
Because of the occasionally dubious nature of water molecules in protein structures, we were careful when we assessed the accuracy of WaterDock. We validated it using high-resolution crystal structures, neutron diffraction data and molecular dynamics simulations. The validation set comprised of 7 different protein and protein-ligand complexes whose structures had been resolved more than once. We defined "consensus" waters as waters that were within 1 Å of another water molecule seen in at least one other structure. These water molecules were used to assess the true positive rate of WaterDock. The binding site water molecules that were seen in only one structure were retained in order to quantify the false positive rate. Using a maximum error of 1.5 Å, WaterDock predicted 81% of consensus water molecules with a false positive rate of 24%. Using 14 structures of OppA bound to different lysine-X-lysine tripeptides as the test set, WaterDock predicted 97% of the ordered water molecules, with on average 1 false positive per structure.
Some Points to Consider when Using WaterDock
More details can be found in the paper and the WaterDock script is supplied in the Supplementary Material (available via the main article online here. For people who wish to use the method, some points for consideration are below:
- AutoDock Vina can produce a maximum of 20 binding modes in each docking run. In the WaterDock method, a single water molecule is independently docked 3 times, creating a maximum of 60 potential water sites. Part of WaterDock's accuracy comes from the fact that in the clustering stage, many overlapping sites are 'averaged over' to produce the final predictions. WaterDock was designed and tested on protein binding sites and the volume of the docking box we commonly used was 15 Å3, or just enough to encompass a potential ligand. This means that method has been validated for a particular density of potential water sites. Hence, if one wishes to predict water molecules for a much larger volume (like an entire protein), we recommend that water molecules should be docked using adjacent or overlapping docking boxes.
- AutoDock Vina ignores the location of hydrogen atoms and uses them only for classifying atom types. Because of this, the WaterDock method can only predict the location of water oxygen atoms.
- AutoDock Vina is a stochastic docking algorithm. Part of the reason we dock a water molecule 3 times is to account for the variability in Vina. Nevertheless, stochasticity is an intrinsic part of the method and different applications of the WaterDock method may produce slightly different results.
- Since the WaterDock prediction is made up of independent docking runs, water-water interactions are not considered. This means that water site locations are not optimised to form hydrogen bonds with each other. Also, because of the clustering method, while no predicted waters can be within of 1.6 Å from each other, they can be separated by a distance just above this. Water-water interactions could be included by performing an energy minimisation on the predictions.
- The placement scripts are available as part of the supplementary information. The classifier (currently written as an R script) is available on request.
Waterdock is maintained by the Structural Bioinformatics and Computational Biochemistry Unit at the Department of Biochemistry in the University of Oxford.
We are happy to help with running and interpreting the results of WaterDock. Contact Dr. Philip Biggin in the first instance.
Please cite the following article in any work that makes use of WaterDock: