Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.
Info

The amount of GPU memory required by alphafold and openfold increases quadratically with the number of amino acids in the system being modeled. If you are modeling a protein complex longer than 1500 residues or so please contact the engineering team before starting as the project will likely require a larger than normal GPU

Using the API

The “AI Folding” api The AI Folding API provides a common interface to AI based protein structure prediction tools. The API currently supports openfold (https://github.com/CyrusBiotechnology/openfold) and alphafold ( https://github.com/deepmind/alphafold#alphafold-output ).

To model a single chain protein with alphafold or openfold

cyrus submit ai-folding input.fasta OpenFold, AlphaFold, and ESMFold. The API runs a post-processing protocol on the results to minimize the output structure into the Rosetta energy function and correct any atomic level errors in sidechain positioning (See Notes for more detail).

Table of Contents

Quickstart

Model a monomer using AlphaFold

Code Block
cyrus engine submit ai-folding input.fasta --mode=monomer --ai-tool=alphafold

Model multiple monomers in parallel using OpenFold

Code Block
cyrus engine submit ai-folding input.fasta input2.fasta --mode=monomer --ai-tool=openfold

Currently, openfold does not support multichain modeling

To model a prokaryotic multichain protein with alphafold

cyrus submit input.fasta --mode=multimer Model a monomer using OpenFold with weights trained by DeepMind (AlphaFold)

Default = OpenFold weights

Code Block
cyrus engine submit ai-folding input.fasta --mode=monomer --ai-tool=openfold --model-sets=alphafold

Create a model using AlphaFold's SingleSeq mode with 2 recycles

Code Block
cyrus engine submit ai-folding input.fasta --mode=singleseq --ai-tool=alphafold --af-

...

To model a eukarytotic multichain protein with alphafold

...

n-recycles=2

Model a multimer (AlphaFold only)

Code Block
cyrus engine submit ai-folding input.fasta --mode=multimer --ai-tool=alphafold

Inputs

FASTA file containing sequence(s) of interest to model.

Options

  • --

...

API Outputs

...

  • ai-tool

    • AI folding tool to run (openfold, alphafold, esmfold)

    • default = alphafold

  • --mode

    • Mode to run with AI tool (monomer, multimer, singleseq)

    • default = monomer

  • --model-sets

    • The set of model weights to use with OpenFold (alphafold, openfold) (See Notes for more detail)

    • default = openfold

  • --existing-model-data

    • Location of existing model data in GCS

    • default = null

  • --precomputed-alignments

    • Directory path to precomputed alignments that will be upload and used for AlphaFold jobs

    • default = null

  • --run-relax

    • Enable or disable the Rosetta relax phase of post-processing

    • default = false

  • --gpu-type

    • Select the GPU type to use (t4, a100) (See Notes for more detail)

    • default = t4

Outputs

  • alignments (directory)

    • Alignment data relevant to AI tool predictions

  • predictions (directory)

    • AI tool model predictions

  • initial_molprobity_reports (directory)

    • Molprobity report for models output by AI tool

  • rosetta_relaxed_models (directory)

    • Rosetta relaxed AI tool models

  • final_molprobity_reports (directory)

    • Molprobity report for relaxed models

Notes

Model weight sets

  • alphafold - weights trained by DeepMind

  • openfold - weights trained by the AlQuarashi Lab for OpenFold

API Post-Processing

The API post-processing protocol consists of the following three steps:

  1. Generate a molprobity report for the models output from the AI tool

  2. Idealize and relax the models output from the AI tool with

...

  1. Rosetta

  2. Generate a molprobity report for the

...

  1. relaxed models.

...

Alongside the raw output from the AI tool, the API will produce the following directories:

  • initial_molprobity_reports – Molprobity reports from step 1 of the post-processing

  • rosetta_relaxed_models – relaxed models from step 2

  • final_molprobity_reports -- relaxed models from step 3

...

Modeling large proteins

The amount of GPU memory required increases quadratically with the number of amino acids in the system being modeled. If you are modeling a protein longer than 1500 residues or so, add the following options to the ai-folding submit command: --gpu-type=a100

References

AlphaFold Github

OpenFold Github

ESMFold Github