Info |
---|
The amount of GPU memory required by alphafold and openfold increases quadratically with the number of amino acids in the system being modeled. If you are modeling a protein complex longer than 1500 residues or so please contact the engineering team before starting as the project will likely require a larger than normal GPU |
The “AI Folding” api The AI Folding
API provides a common interface to AI based protein structure prediction tools. The API currently supports openfold (https://github.com/CyrusBiotechnology/openfold) and alphafold ( https://github.com/deepmind/alphafold#alphafold-output ).
To model a single chain protein with alphafold or openfold
cyrus submit ai-folding input.fasta OpenFold, AlphaFold, and ESMFold. The API runs a post-processing protocol on the results to minimize the output structure into the Rosetta energy function and correct any atomic level errors in sidechain positioning (See Notes for more detail).
Table of Contents |
---|
Quickstart
Model a monomer using AlphaFold
Code Block |
---|
cyrus engine submit ai-folding input.fasta --mode=monomer --ai-tool=alphafold |
Model multiple monomers in parallel using OpenFold
Code Block |
---|
cyrus engine submit ai-folding input.fasta input2.fasta --mode=monomer --ai-tool=openfold |
Currently, openfold does not support multichain modeling
To model a prokaryotic multichain protein with alphafold
cyrus submit input.fasta --mode=multimer Model a monomer using OpenFold with weights trained by DeepMind (AlphaFold)
Default = OpenFold weights
Code Block |
---|
cyrus engine submit ai-folding input.fasta --mode=monomer --ai-tool=openfold --model-sets=alphafold |
Create a model using AlphaFold's SingleSeq mode with 2 recycles
Code Block |
---|
cyrus engine submit ai-folding input.fasta --mode=singleseq --ai-tool=alphafold -- |
...
To model a eukarytotic multichain protein with alphafold
...
af-n-recycles=2 |
Model a multimer (AlphaFold only)
Code Block |
---|
cyrus engine submit ai-folding input.fasta --mode=multimer --ai-tool=alphafold |
Inputs
FASTA file containing sequence(s) of interest to model.
Options
-
...
-ai-tool
AI folding tool to run (
openfold
,alphafold
,esmfold
)default =
alphafold
--mode
Mode to run with AI tool (
monomer
,multimer
,singleseq
)default =
monomer
--model-sets
The set of model weights to use with OpenFold (
alphafold
,openfold
) (See Notes for more detail)default =
openfold
--existing-model-data
Location of existing model data in GCS
default =
null
--precomputed-alignments
Directory path to precomputed alignments that will be upload and used for AlphaFold jobs
default =
null
--run-relax
Enable or disable the Rosetta relax phase of post-processing
default =
false
--gpu-type
Select the GPU type to use (
t4
,a100
) (See Notes for more detail)default =
t4
Outputs
alignments
(directory)Alignment data relevant to AI tool predictions
predictions
(directory)AI tool model predictions
initial_molprobity_reports
(directory)Molprobity report for models output by AI tool
rosetta_relaxed_models
(directory)Rosetta relaxed AI tool models
final_molprobity_reports
(directory)Molprobity report for relaxed models
Notes
Model weight sets
alphafold
- weights trained by DeepMindopenfold
- weights trained by the AlQuarashi Lab for OpenFold
API Post-Processing
The API post-processing protocol consists of the following three steps:
Generate a molprobity report for the models output from the AI tool
Idealize and relax the models output from the AI tool with Rosetta
Generate a molprobity report for the relaxed models.
Modeling large proteins
The amount of GPU memory required increases quadratically with the number of amino acids in the system being modeled. If you are modeling a protein longer than 1500 residues or so, add the following options to the ai-folding submit command: --gpu-type=a100