Ligand Docking Server Documentation
Overview:
RosettaLigand is a tool for docking small molecules into proteins. RosettaLigand takes as input an SDF file containing the small molecule ligand to be docked, and a PDB file containing the protein the ligands should be docked into.
Tips:
- By default, only the provided ligand conformers will be used. If you don't want to use conformers, provide an SDF file with a single ligand conformation
- Conformers can be generated using OpenEye Omega, the BCL, MOE or other conformer generation tools. All conformers in the same SDF file should have the same name.
You can upload a single ligand conformation and check the “Generate ligand conformers with the BCL” option to have the server automatically generate a set of conformers using the BCL.
- All ligand conformers in the input file must have 3D coordinates and added hydrogens. (This applies even if the server-generated conformer option is being used.)
- The ROSIE Ligand docking protocol is not set up to do virtual high throughput screening (vHTS).
Each job submission should consist of a single small molecule being docked to a single protein.
Providing multiple, chemically distinct molecules in the input SDF file will result in an error.
- RosettaLigand cannot perform binding site detection. The approximate location of the binding site within 5 Å should be known. SiteHound-web can
can identify potential ligand binding sites with a probe molecule. The center coordinates in the SiteHound output should be entered as starting coordinates for ROSIE ligand docking. Multiple docking runs may be needed to
investigate multiple potential binding sites. SiteHound-web is provided as a suggestion and is not affiliated with ROSIE in any way.
- While RosettaLigand is usually capable of making accurate binding predictions, some protein systems are very difficult to dock into,
the following guidelines can help maximize the likelihood of obtaining high quality predictions:
- Docking performance his highly dependent on backbone conformation. If possible, use an X-ray crystal structure with resolution less than 2.0 Å as input
- Apo structures are more difficult to dock into. If possible, use an input structure co-crystallized with a bound inhibitor or native ligand. Docking performance
improves if the co-crystallized ligand is similar to the ligand being docked.
- If multiple crystal structures of the same protein exist, dock ligands into all of them.
- RosettaLigand is not optimized for docking into shallow binding pockets, or predicting surface binding interactions. For
best results, use relatively deep binding pockets.
- While RosettaLigand is capable of handling systems with co-factors, metal ions, or tightly bound waters at the protein-ligand interface,
these systems are enormously more complex, and the likelihood of RosettaLigand being capable of correctly handling these systems is reduced.
We strongly recommend against using this server for docking into protein-ligand systems with co-factors, metal ions, or waters at the interface.
- If a crystal structure with a bound inhibitor or native ligand exists, benchmark RosettaLigand by re-docking this inhibitor into the
crystal structure. If the lowest scoring model is not within 2.0 Å RMSD of the crystal structure, it is unlikely that Rosetta will be
capable of making accurate predictions with this protein system. See Interpreting Results for details.
- The RosettaLigand protocol used here typically requires about 200+ models to produce a high quality protein-ligand docking pose. (See DeLuca et al. PLoS ONE 10(7): e0132508 for performance details.)
Input:
In general, the default input parameters for the RosettaLigand server are reasonable. The parameters have the following definitions:
- Input PDB File -- **required** A PDB file containing the protein without the ligand present. Residues which Rosetta does not natively recognize
(such as waters, crystallographic reagents and cofactors) will be automatically removed, although for best results it is recommended to
edit the PDB to contain just the protein portion of the molecule to be docked to before uploading.
- Input SDF File -- **required** An SDF file containing the conformers of a *single* ligand to be docked. (At this time only the common 'V2000'-style SDF files are supported.)
If no conformers are available, the SDF file should contain only a single ligand conformation. Every record in the SDF file must have 3D coordinates and all hydrogens added.
If your SDF file is not being recognized by the RosettaLigand protocol, we recommend passing it through OpenBabel or opening and resaving with Avogadro
to normalize the file format. These programs can also be used to convert PDB, MOL2, and other formats to SDF format, as well as adding hydrogens and converting 2D representations to 3D.
- Generate ligand conformers with the BCL -- If checked, conformers will be generated with the BCL,
using the first structure in the provided ligand SDF file. These conformers will be added to the set of conformers present in the input file.
If unchecked, just the conformer library provided in the input SDF file will be used.
- Maximal number of ligand conformers to generate: -- If "Generate ligand conformers with the BCL" is checked,
this is the maximal number of conformers which will be generated.
In practice, fewer than this number will actually be generated, due to limits on conformational flexibility of the molecule.
- Use the starting coordinates in the SDF -- If checked, the starting position of the ligand (the center of the binding site)
will be taken from the first conformer in the input ligand SDF. The server calculates the center point by averaging the X,Y,Z coordiantes of all non-hydrogen atoms
in the ligand. If you use this option, it's highly recommended to check that the first conformer is positioned roughly in the correct binding site by loading
both the protein PDB and the ligand SDF in a structure viewing program like PyMOL prior to submission.
(Only the coordinates of the first conformer are used - subsequent conformers will be re-aligned.)
- X,Y,Z coordinates of starting position -- If "Use the starting coordinates in the SDF" is left unchecked,
the cartesian coordinates where the ligand should be initially placed prior to docking must be manually specified.
This should be as close as possible to the actual ligand binding site (within the "maximum radius to search"). If a crystal structure is available with a bound
ligand in the active site, the geometric center, or centroid, of that ligand is usually a good starting place. One way to approximate this value is by identifying
an atom near the center of the ligand and using the X,Y,Z coordinate of that atom from the PDB. More information on ligand coordiantes in the PDB format can be found
in the HETATM description on page 190 of the PDB format guide.
If a bound ligand is not available, a good approach is to average the coordinates of atoms surrounding the desired binding pocked.
- Number of structures to generate -- The number of docking predictions to create. 200 is a good number of docking predictions for the current default settings.
Advanced Settings:
The options below are considered “Advanced Settings”. Descriptions are provided along with examples of when you might adjust these settings. However, you do not need to change any of these settings in order to run ROSIE ligand docking.
- Ligand chain name -- This is what ROSIE will call the ligand chain when it's added to the output structures.
The default value here is "X", which is almost certainly acceptable. You may change this if your protein has a chain "X" or if you wish
to use a different letter to distinguish between different ligands from multiple docking runs.
- Maximum radius to search -- The radius from the starting position in Angstroms to sample during the initial phase of docking. 5 Å
is usually a good starting point and represents a search volume of 523 cubic Å. YOu may wish to alter this value based on the size of your binding pocket.
- Maximum number of cycles of low-resolution Monte Carlo Sampling -- Initial low-resolution sampling is a Monte Carlo process of rotation,
translation, and conformer selection. This is the number of Monte Carlo steps. 500 is a good default. This hsould be increased if the binding volume is
very large, or if the ligand has a high degree of conformational flexibility.
- Initial Perturbation -- The starting position and orientation of each output structure will be randomized, within a sphere of the given radius from the starting position.
The orientation is always randomized but you may wish to change the position randomization based on how precisely defined the ligand binding site is. This value should be smaller than
the "Maximum radius to search" parameter.
- Width of low-resolution scoring grid -- For speed, the low resolution sampling stage has a pre-computed scoring grid. For accurate scoring, this should
cover all the positions which the ligand can sample. The default value of 15 Å works well for drug-like small molecules. However you may wish to increase this for larger molecules.
When in doubt, the grid should be made larger as the only cost is a slight increase in computation time.
- Low Resolution Monte Carlo move/angle steps -- The maximum size (in Å/degrees) of a single translational/rotational perturbation step in low resolution Monte Carlo sampling.
The defaults should be good for most cases. The size of the low resolution step should be increased if the search radius and initial perturbation are increased.
- Total cycles of highres docking to perform -- The total number of cycles of high resolution docking to perform. 6 is almost always a good number.
This value has a significant impact on run-time due to the computational cost of protein flexibility sampling.
- Repack every nth cycle of highres docking -- High resolution docking consists of alternating cycles of repacking and small perturbation moves.
This option specifies how often repacking occurs. 3 is a good default.
Interpreting Results:
In general, the interface energy is the best metric for discriminating between ligand binding poses. RosettaLigand only minimizes protein atoms within 7 Å of the ligand while treating the rest of the receptor as rigid. However, the Rosetta energy function is evaluated as the sum of all residues in the protein, the total score is generally very noisy. Thus, we recommend that the poses with the lowest interface_delta score be selected. However, structures with abnormally high total score (as compared to the other structures in the run) may indicate a docked conformation which has contorted the protein in order to bind the ligand.
The transform_accept_ratio in the scorefile gives a rough diagnostic about how well the low-resolution stage performed. This should normally be between 0.2 and 0.8. Having more than a few structures with a transform_accept_ratio of 0 means that either the initial perturbation is set too high (for best results, this should normally be less than two thirds of the the pocket size), the grid size is too small (this should be normally set to more than the length of the ligand plus twice the pocket size), or the pocket is too small for the ligand size (try additional conformers or a different starting backbone).
If a structure co-crystallized with a bound inhibitor is present, the native ligand should be re-docked into the crystal structure using the same input settings as will be used in the experimental study. This re-docking will serve as a validation experiment to determine if RosettaLigand is capable of correctly modeling the protein system. If the lowest scoring model generated has an RMSD of more than 2.0 Å, it suggests that you are working with a protein system that Rosetta is unable to model effectively, and any predictions generated by this server should be viewed with skepticism.
When docking a ligand with unknown activity or binding position, generate at least 200 models, and select the best 1-20 models by interface_delta score (lower scores are better). The interface_delta score is the difference between the total Rosetta energy score with the ligand bound, and the ligand unbound. In general, RosettaLigand is capable of discriminating between well and poorly bound ligands based on score.
The small ensemble of top scoring models should then be evaluated visually using a tool like pymol. The predicted binding poses should be evaluated in the context of existing crystal structure information, and whatever experimental or structural data is available to you.
Alternate Protocols
RosettaLigand is intended to dock small molecule ligands only (metabolite- or drug-like organic molecules). It is not intended for docking protein, peptide, or nucleic acid ligands.
- For protein-protein docking see the ROSIE docking server.
- For protein-peptide docking see the FlexPepDock server.
- There is not currently a server for protein-nucleic acid docking with Rosetta.
We welcome scientific and technical comments on our server. For support please contact us at Rosetta Forums with any comments, questions or concerns.
Modeling tools developed by the Meiler Lab at Vanderbilt University. The Rosie implementation was developed by Samuel DeLuca, Rocco Moretti and Sergey Lyskov.