FARFAR2 Server Documentation
Overview:
FARFAR2 is a tool for building a 'full model' of a medium-size noncoding RNA. The goal of FARFAR2 is to provide a flexible and extensible fragment assembly protocol, capable of stitching together multiple subsections of an RNA structure or building a model entirely fron scratch. We have exposed options sufficient to allow the user to reproduce any benchmark case run in (Watkins and Das, 2020), with helix flexibility provided in a kinematically realistic way using libraries of base pair steps.
We've provided two interfaces to the FARFAR2 webserver, because we expect the majority of our users will be happy just to input a sequence and a secondary structure -- the simple interface -- but we want to make sure that a lot of options are available for users with more complicated modeling tasks. When providing a secondary structure (in "dot-bracket" notation) you will want to confirm that all the base pairs you specify are canonical, Watson-Crick base pairs or wobbles. Noncanonical base pairs may be supplied via the advanced interface as part of a "general" secondary structure. Finally, while it is possible to model multi-chain RNAs with the "simple" interface (just separate distinct chains in both your sequence and secondary structure with commas), all the chains need to be connected by base pairs. (It is also a little easier to model multi-chain RNAs by providing a FASTA file, since that way you can control what the chain letters and numbering will be after modeling.)
The FARFAR2 advanced interface provides many additional input options. You may supply local templates as PDB files: these would be previously solved regions of structure that you suspect will reliably fold into the same conformation in this modeling problem as well. (Classic examples where this strategy works well include well defined motifs like kink-turns or loop-E motifs, as well as well-conserved ligand binding sites like the S-adenosyl methionine four way junction binding site common to multiple SAM riboswitches.) Conversely, if you know the approximate relative orientation of some segments of RNA structure -- suppose you know which pairs of helices are stacked around a multi-way junction -- you can supply an "alignment" file to constrain the helices in question to that relative orientation.
We provide three options for scoring functions. The first is the original method used by FARFAR (Das, 2010). The second is the optimized method from the FARFAR2 paper (Watkins, 2020), and is the default. The third is a 'beta' method that may be of some interest to specific researchers.
Using the advanced interface, you may also select different fragments sources. We suggest the most up-to-date fragment library created for the FARFAR2 publication, but as controls older sources are perfectly feasible. We also provide a number of options that control how the fragment library is used. For example, we have developed a feature that allows users to make predictions of previously deposited structures under "like-blind" conditions, which of course requires any RNAs that are too similar to be eliminated from the fragment library. We allow fragments from the desired library to be eliminated if they are locally too similar in conformation to the native; users of this server may select that RMSD radius, as well as the sstringency with which the fragment sequence is matched to the native. Finally, we allow the generation of additional random fragment samples within a customizable dihedral distance of the library fragment.
To specify the actual input modeling problem, the advanced interface allows the provision of a FASTA file. The reason this is so important is because it allows the user to number input helices or other local templates so that they correspond to the overall modeling problem. Don't supply a completed structure as the input to FARFAR2 -- it will notice that your structure is identical to your supplied FASTA and it will just do no work! Instead, supply what residues you do know, and FARFAR2 will fill in the rest.
If you provide a native structure, then FARFAR2 will compute the RMSD to that native structure for each decoy. (Make sure that the chains and residue numbering of your native matches the FASTA file you provided, so that FARFAR2 knows how to build a correspondence from models to the native so that it can calculate RMSD.) If no native is provided (perhaps none has been solved yet!), then the server will instead determine the lowest-energy model and compute RMSDs to that model. The resulting folding funnel can be a useful visual reference for how well the models are converging, and thus how likely your predictions are to be correct.
Finally, FARFAR2 has a number of methods for incorporating experimental data. If you have an RDAT file describing chemical mapping data, or a file of NMR chemical shifts, either can guide modeling.
Similarly, if you have experimental knowledge of that could translate into restraints, they may be provided as a Rosetta constraint file. A simple exmaple might be knowledge of the distance between a pair of atoms or the likely value of a dihedral angle. (You may intend something subtle here: for example, rather than believing that a certain atom pair distance is necessarily true, you may be interested in answering the question "what would the resulting structural ensemble look like if these two atoms had to be at this distance?" and comparing the properties of that ensemble to one predicted without restraints. If the imposition of a restraint results in an ensemble of (heuristically) terrible structures, that restraint may be implausible. You may refer to documentation of Rosetta constraint file format here. Those restraints may be applied all at once or progressively based on the primary sequence separation of the residues in question -- this "staging" of restraints is recommended for one particular experimental application.
We have found that restraints are an excellent way to encode the results of MOHCA-seq experiments, and that the resulting simulations produce accurate predictions of experimental structures. MOHCA-seq experiments produce "strong" and "weak" signals of nucleotide-nucleotide proximity whose functional form may be expressed as the sum of two functions, whose weights are given by the strength of the constraint. A "strong" restraint between residues 2 and 38 would be specified via:
AtomPair O2' 2 C4' 38 FADE 0 30 15 -4.00 4.00 AtomPair O2' 2 C4' 38 FADE -99 60 30 -36.00 36.00while a "weak" restraint would be:
AtomPair O2' 2 C4' 38 FADE 0 30 15 -0.80 0.80 AtomPair O2' 2 C4' 38 FADE -99 60 30 -7.20 7.20that is, one-fifth the strength. For a specific example of a simulation run on ROSIE using MOHCA-seq style constraints, please see this repository.
There are a couple of settings that are no longer supported in FARFAR2 that were unique to the original FARFAR webserver. If for some reason (benchmarking consistency or a special use case) you very much would like to use the 2012 force field, a particular bulge entropy score term, or permit variable bond lengths and angles, feel free to use that webserver instead.
Tips
- Do a trial run first with just a few structures, and view the molecule in PyMOL or your favorite viewer. This is particularly important if you have a multi-stranded motif where the connectivity could be a little complicated -- check that the strands are separated, and that any specified Watson-Crick pairs are reasonably paired.
- Unlike the original FARFAR webserver, FARFAR2 can handle non-standard nucleotides like 2′O-methylated nucleotides, dihydrouridine, or pseudouridine. You will have to use the advanced interface, where their name in the supplied FASTA file will be a little more complicated: instead of a/c/g/u, pseudouridine is represented as X[PSU] (because PSU is its three-letter code).
- As with most other modes in Rosetta, the final ensemble of models is not guaranteed to be a Boltzmann ensemble.
-
A simple hairpin. Sequence is:
gggcgcaagccu
Secondary structure [optional] could be((((....))))
Note that we could also leave out the secondary structure and let Rosetta fragment assembly sample a lot of secondary structures. -
A string of four purine/purine pairs (check out PDB file 283D) (a 4x4 'internal loop'), with the following toplogy:
5′-CGAAAG-3′ |****| 3′-GAAAGC-5′
The sequence input would be:cgaaag,cgaaag
This is a two-strand motif, so we must specify a secondary structure, which would be:(....(,)....)
-
A pseudoknot. (check out PDB file 1L2X), with the following schematic topology:
5′-GCGCGG---C ||||| A GCGCCUGCC G ||| AACAAACGG-3′
The sequencegcgcggcaccguccgcggaacaaacgg
The above is enough for the job to execute, since this is a single-strand motif. If you are interested in testing if this pseudoknot can form, just input the sequence. But if you are sure that the pseudoknot forms and you are curious about its 3D structure, you can include a secondary structure like this:.(((((..[[[.))))).......]]]
Here the Watson/Crick stems are 'non-nested', so the second stem is written with square brackets to avoid ambiguity. You can specify further levels of non-nested stems with curly braces, angle braces, and finally with lowercase letters if absolutely necessary. - You will be using the "advanced" interface, because we will need to supply local template structure and a FASTA file with PDB numbering that matches.
- The actual target for RNA-Puzzle 20 was deposited as 5Y87. You can submit that PDB as the native structure for this run if you would like RMSDs to be calculated to that model immediately. Alternatively, you can omit the native for this run and you'll see RMSD as calculated to the lowest energy structure observed in the ensemble, which is often used as a heuristic to see how well the simulation converged on an answer.
-
The FASTA that defines the RNA-Puzzle 20 target sequence is:
>5y87 A:1-18 acccgcaaggccgacggc >5y87 B:1-50 gccgccgcuggugcaaguccagccacgcuucggcgugggcgcucaugggu
The chain/numbering is specified in those lines to ensure a match with the native structure. - You need one starting PDB as a local template structure: the T-loop from the previously deposited twister sister ribozyme, RNA-Puzzle 19 (PDB: 5T5A). Using a structure editor like PyMOL, you can select the intercalator A:8 as well as the T-loop nucleotides A:23-26 and save them. It happens to match in sequence exactly, but if it didn't (for example, if instead of augcaa the sequence were auguaa), you could use the rna_thread server on ROSIE to thread on the correct sequence. Then, use the renumber_pdb server on ROSIE to renumber this template to "A:7 B:12-16" (the correct numbering for this T-loop in the context of Puzzle 20.)
-
In our original modeling, we had guessed at the secondary structure using the literature
alignment of twister sister ribozymes from Breaker and our previous experience with
prediction RNA-Puzzle 19 (PDB: 5T5A).
((((...((((((.((((,)))).)(((((.......)))))(((((....)))))))).))...))))
- Upload your starting template, FASTA, secondary structure, and (optionally) your native PDB inputs. Select a reasonably large number of models (the limit for the server is 2000) and go! In our retrospective analysis, we were able to obtain a 3.03 Å model in only 1547 total decoys, so this should work pretty well for you!
Examples
Realistic RNA-Puzzle Walkthrough
We have used FARFAR2 to solve several RNA-Puzzles successfully, and those simulations can be replicated in a feature-complete way on this webserver. Let's walk through what you'd need to solve RNA-Puzzle 20, a twister sister ribozyme, the same way that the Das lab did originally.
We welcome scientific and technical comments on our server. For support please contact us at Rosetta Forums with any comments, questions or concerns.
Modeling tools developed by the Das Lab at Stanford University. The Rosie implementation was developed by Andrew Watkins and Sergey Lyskov.