ModelCraft is an automated model-building pipeline that builds proteins and nucleic acids into X-ray crystallography or cryo-EM maps. It combines the previous Buccaneer and Nautilus pipelines and adds new steps for density modification, refinement, validation, pruning and rebuilding.
ModelCraft is available for X-ray crystallography in the CCP4i2 and CCP4 Cloud interfaces. Please see the CCP4i2 documentation and the CCP4 Cloud documentation for more information on how to run it in these interfaces.
Installation
ModelCraft is distributed with CCP4 (version 8 onwards) and can be run from the CCP4 command line, but if a newer version of ModelCraft is required it can also be installed using pip for Python 3.7 or newer, e.g.
python3 -m pip install --user modelcraft
Refer to the pip documentation if pip is not installed. In order to run ModelCraft, the CCP4 environment needs to be set up so that programs (such as Buccaneer and Refmac) can be called from the command line. For building into cryo-EM maps, the CCP-EM environment also needs to be sourced.
Usage
The first argument must be either xray
or em
for X-ray crystallography or cryo-EM.
The simplest execution for X-ray crystallography
requires only a description of the asymmetric unit contents
(see the next section)
and a reflection data file in MTZ format
(with observations, a free-R flag and starting phases).
modelcraft xray --contents contents.json --data data.mtz
Alternatively, a model can be provided (in PDB, mmCIF or mmJSON format), which will be refined and used as a starting point instead of starting from phases in the data file.
modelcraft xray --contents contents.json --data data.mtz --model model.cif
For cryo-EM, either two halfmaps or a single map must be provided along with a resolution.
modelcraft em --contents contents.json --map half1.mrc half2.mrc --resolution 2.5
modelcraft em --contents contents.json --map map.mrc --resolution 2.5
The command line documentation has more detailed information on individual arguments.
modelcraft xray --help
modelcraft em --help
ASU Contents Description
A description of the expected contents of the asymmetric unit
must be provided as either a FASTA sequence file or a JSON file
using the --contents
argument.
A sequence file is simpler,
but the JSON format has the following advantages:
In order to create a JSON file it may be helpful
to start from the contents for an existing PDB entry.
The modelcraft-contents
script
creates a contents JSON file for a released PDB entry.
An example JSON file is shown below:
{
"copies": 2,
"proteins": [
{
"sequence": "LPGECSVNVIPKMNLDKAKFFSGTWYETHYLDMDPQATEKFCFSFAPRESGGTVMEALYHFNVDSKV",
"stoichiometry": 1,
"modifications": ["M->MSE"]
},
{
"sequence": "GGG"
}
],
"rnas": [
{
"sequence": "GGUAACUGUUACAGUUACC",
"stoichiometry": 2,
"modifications": ["1->GTP", "19->CCC"]
}
],
"dnas": [],
"carbs": [
{ "codes": { "NAG": 2 }, "stoichiometry": 1 },
{ "codes": { "MAN": 1, "NAG": 2 }, "stoichiometry": 1 }
],
"ligands": [
{ "code": "HEM", "stoichiometry": 1 }
],
"buffers": ["GOL", "NA", "CL"]
}
The file has a list of proteins
, rnas
,
dnas
, carbs
, ligands
,
and buffers
that are in the crystal.
The only mandatory items are that each
protein, RNA or DNA chain must have a sequence
,
each carbohydrate must have a dictionary of codes
to specify the number of each sugar,
and each ligand must have a single code
.
Each component (other than buffers) has a
stoichiometry
parameter to specify the stoichiometry.
In the example above there are 2 RNA chains for each protein chain.
If the stoichiometry is not specified it is assumed to be 1.
There is also a copies
parameter for the whole file
to specify how many copies of the contents are in the asymmetric unit.
If this value is not known the most likely number will be estimated.
The modelcraft-copies
script can be used to
view the solvent fraction and probability for each number of copies
given a contents file and an MTZ file.
It is assumed that the number of ordered buffer molecules is unknown
so they are not included in the solvent calculation.
Finally, protein, RNA and DNA chains may have
a list of modifications
,
e.g. M->MSE
to specify that
all methionine residues are actually selenomethionine
or 1->GTP
to specify that
the residue 1 is guanosine triphosphate.
Note: ModelCraft does not yet build carbohydrates, ligands, or modified residues (other than selenomethionine derivatives). However, this is planned for the future and inclusion of these components in the contents allows for more accurate calculation of the solvent fraction during Parrot density modification in the X-ray pipeline.