This workshop will give an introduction to the model building tools available in CCP4 for nucleic acid structures, using data from a published RNA structure as an example. The structure we are trying to build is 1HR2 with 2 copies of a 158-residue domain at 2.25 Å resolution (Juneau et al. 2001). The map and phases we will be starting from are from the rebuilt and re-refined model in PDB-REDO, which has an R-free value of 26.4%. This map is unrealistically good compared to one that would have be obtained after either experimental phasing or molecular replacement using a homologue or a predicted model.
Click the following files to download them to your computer:
We will be using the CCP4i2 interface in this workshop.
It is also possible to perform the same tasks in CCP4 Cloud
but the step-by-step instructions would be slightly different.
Start CCP4i2 by double-clicking the desktop icon
(or by entering
ccp4i2 on the command line)
then create a new project using
File/Projects / New project.
Create a New Project window, enter a project name
and click the
Create project button.
The first task will be to import the reflection data into CCP4i2.
Import and split MTZ into experimental data objects
in the task menu.
On the input tab, use the folder icon on the right
to select the
data.mtz file that was downloaded earlier.
A window will open to ask about the provenance of the imported file.
If this is not something you wish to use
you can uncheck the option at the top of the window
to stop it being shown when importing a file.
OK to accept the default description of the source of the file.
Select all importable button to select the observations (Obs),
free-R flag (FreeR), phases (Phs) and map coefficients (MapCoeffs).
Click the run button at the top of the window to finish importing the data.
The second task to run is the
Define AU contents task.
The input page allows us to specify a list of protein and nucliec acid sequences.
+ button on the bottom left of this box
to select the
The sequence should be added automatically with the type of RNA.
With the sequence added, selecting
Experimental data in the box below
will perform a solvent content analysis
to help guess the number of copies in the asymmetric unit (AU).
There are four entries in this dropdown -
one for each reflection data object imported,
each with the name
However, all have the same cell and space group
so it doesn't matter which one is chosen as the volume of the AU is the same.
In this case we know there are two copies in the AU
so change the
Copies field in the sequence list to
and run the job to create the AU contents object.
Now we have the reflection data and AU contents description
we can try automated model building using ModelCraft.
This is a model building pipeline that combines
protein and nucelic acid model building using Buccaneer and Nautilus
with other programs for refinement, density modification and model validation.
ModelCraft task from the task menu.
Free R set
Reflection data box.
As we are providing initial phases instead of getting them from a starting model,
uncheck the option at the bottom of this box
and select the
Phases loaded earlier.
The phases are not unbiased (they come from the final model)
so do not check the option below.
AU contents created in job 2 then run the task
with all other defaults.
ModelCraft will take a while to run to completion, but results should start appearing in the report after the first cycle has completed. After it has finished, there should be a report similar to the following image. The pipeline has decided that the model from cycle 13 was the best model. It stopped because it did not produce a better model in cycles 14, 15, 16 or 17. ModelCraft built a model with 258 residues and a free R-factor of 34%. For comparison, the PDB-REDO model has 315 residues and a free-R factor of 24%. ModelCraft is better at building protein models than nucelic acid models. If it did not produce a good model I would recommend trying other automated model building programs such ARP/wARP (which can be accessed through CCP4 Cloud or the ARP/wARP Web Service) or PHENIX AutoBuild. While ModelCraft is running, move on to the next section to try interactive model building with Coot.
While ModelCraft is performing fully-automated model building,
we will introduce the interactive model building tools for nucleic acids in Coot.
Start by finding the
Manual model building - COOT task in CCP4i2.
We will just start from the map imported in job 1.
Make sure the
Electron density maps selection
is set to
and all the other selections are set to
...is not used,
then run the task to open Coot.
Coot includes automated nucleic acid building through an adapted implementation of the Nautilus program (known as Cootilus). This implementation is faster than Nautilus because it only searches for nucleic acids within a user-specified radius of the view center. If initial residue positions are found, Cootilus will attempt to grow these into longer RNA chains. It can help to repeat this process by centering the view on parts of the RNA model that are missing or built incorrectly. This tool may overwrite and join with existing nucelic acid residues. However, it does not perform sequencing and will build all nucelic acids as uridine (U) with a truncated base.
Before trying it out, turn on symmetry copies
by going to
Draw / Cell & Symmetry.
Master Switch to
20.0 and click
This will help to avoid building into symmetry copies.
Cootilus can be found at
Calculate / Other Modelling Tools / Build Nucleic Acid.
Try to build a long single chain using this tool,
but don't spend too much time on it.
You may find it struggles to build in some areas of the map.
Search radius parameter might help.
Go To Atom window (shortcut
will help to see the residues you have built already.
Also available in the
Other Modelling Tools window
Nucleotide Builder (listed as
DNA & RNA Models).
It adds a new molecule with idealised RNA or DNA
in A or B form with a specified sequence.
Rotate / Translate tool in the refinement toolbar (on the right)
can be used to get roughly the correct position
before real-space refinement and merging into the existing molecule.
If standard real-space refinement does not work to refine the ideal strand
then it may help to try jiggle fitting instead.
To jiggle fit a single residue use the shortcut key
It is also possible to perform jiggle fitting on whole chains or whole molecules.
To get these options, first open
File / Curlew (the Coot extensions wrangler)
and install the
Jiggle fitting tends to work better at lower resolution.
With higher resolution data it might help to blur the map first
Calculate / Map Sharpening/Blurring.
New adenosine residues can be added at the end of chains using the
Add Terminal Residue tool in the Refinement toolbar.
These will require some real-space refininement after they have been added.
It helps to refine the previous residue along with the new one so that
there is some flexibility in the bonding.
This can be done with the
Real Space Refine Zone tool
or by using the shortcut key
to refine the residue at the center of the view and one residue either side.
If the shortcut key does not work, you may need to first click
Edit / Settings / Install Template Keybindings.
Other Modelling Tools toolbar,
Base Pair tool
builds a nucleic acid opoosite the chosen one to form a bonded base pair.
In the example below we use it on a
and it builds a
C in roughly the right place.
Residues can be mutated individually using the
Simple Mutate tool.
Afterwards they require some refinement,
which could be done using the shortcut key
which is similar to the triple refine
but it automatically accepts the refinement result.
It is also possible to mutate a range of residues using the
Mutate Residue Range tool in the
You need to specify the molecule, chain ID,
a range of residues and the sequence that you want to mutate that range to.
Autofit the mutated residues? option
may not work currently for nucliec acids
but real-space refinement can be performed afterwards.
Automatic sequence assignment and mutation can be attempted
through an automated model building program such as Nautilus within ModelCraft.
This may alter the backbone sugar-phosphate positions
but it will also try and sequence the model.
If it manages to assign a sequence it will build the corresponding bases.
For the quickest results in ModelCraft,
Starting model that needs to be sequenced,
reduce the number of cycles from
and check the
Run a quicker basic pipeline option.
Additionally, there is a new tool for identification, assignment and validation of nucleic acid sequences called doubleHelix from Grzegorz Chojnowski, which is similar in concept to the findMySequence and checkMySequence programs for protein model sequencing. This program is not yet in the CCP4 suite, but installation instructions can be found on the GitHub repository for those who are familiar with the command line.
As with standard protein residues, standard DNA and RNA residues have well-tested restraints in the CCP4 monomer library. With this high-resolution structure we are working on, the map is good enough that we can build the structure without extra restraints. However, when building into poorer maps it may help to restrain the model more. Restraints are not binary (i.e. on or off) and the weighting is important. Structures should not be under or over restrained during refinement.
Refinement Weight can be changed in the
Refine/Regularize Control menu at the top of the refinement toolbar.
The default is
Higher numbers use restraints less (better at higher resolution)
and lower number use restraints more (better at lower resolution).
It is worth trying the
Estimate button but the estimate may not work.
If you want more restraints because your map is poor
and the estimated value is higher than 60 then overwrite it yourself.
This menu also has some general settings for real-space refinement,
e.g. to turn on
There are some restraints specific to nucelic acids available in Coot.
Calculate / Modules / Restraints option
to add a new
Restraints menu to the top menu bar.
It has options to add RNA A form restraints, DNA B form restraints
and parallel planes restraints.
LibG (available in CCP4) and doubleHelix are programs that can be used to
automatically generate base pair and base stacking restraints.
Many of the validation tools and metrics work equally well
with both protein and nucleic acid structures.
For example the
Density fit analysis graph,
which compares the average density at the atomic positions of each residue,
or the Molprobity contact analysis avalable through
(where pink dashes show bad overlaps).
Coot also has a validation tool to analyse the ribose ring puckers
which will open a window with buttons to take you to pucker outliers.
PDB-REDO validates dinucleotide conformations using the CONFAL score from DNATCO and base pair conformations using DSSR. If you create an account with PDB-REDO you can submit your own jobs to produce a re-refined version of your model with a validation report that includes these metrics.
After working on the model in Coot it is important to save your progress.
File / Save mol to CCP4i2
to choose a molecule to save to CCP4i2.
File / Save to CCP4i2 option
will save molecule number 0 in the Display Manager to CCP4i2 (if there is one).
After this you can safely close the Coot window.
If you close the Coot window before you have saved you may lose your work.
Once the Coot window has been closed the job should be marked as finished.
At the bottom of the results page are suggestions for tasks to run next.
REFMAC5 button opens a Refmac task
with the input parameters already filled in to refine the model saved from Coot.
After the Refmac task has finished,
COOT button at the bottom of that results page
will open a Coot task with the input parameters set
to open the model, map and difference map from Refmac.
This allows for a quicker cycle of refining, validating and rebuilding.
If you have time left, try to build as much of the 1HR2 structure as you can
starting from either the ModelCraft structure or your own model from Coot.
If you want to compare to the deposited structure, download
and use the
Match model to reference structure task
to move your own model onto the same symmetry copy
before opening both models in Coot.
Paul Bond, University of York, firstname.lastname@example.org
L Potterton et al. Acta Cryst. D, 74, 68 (2018)
Coot: A Casañal et al. Protein Sci., 29, 1069 (2020)