RfDiffusion

The RfDiffusion API runs RfDiffusion on an input template PDB file. RfDiffusion is a method for structure generation (with or without conditional information) useful for protein design challenges. RfDiffusion is capable of motif scaffolding, unconditional protein generation, symmetric motif scaffolding, binder design, and more (See References).

Quickstart

RfDiffusion can be run with the following command:

cyrus engine submit rf-diffusion input-template.pdb --n-rfdiffusion-designs 23 --n-mpnn-designs 3 --rfdiffusion-contigs A1-25/25-25 --mpnn-af2-contigs A1-25/25-25

(See Notes for more information on RfDiffusion contigs and workflow)

Inputs

  • --template (str)

    • Input template PDB file

  • --mpnn-af2-contigs

    • Select the contigs for the MPNN/AF2 runs

  • --n-mpnn-designs

    • number of protein MPNN designs

  • --n-rfdiffusion-designs

    • number of RfDiffusion designs

  • --rfdiffusion-contigs

    • Contigs for RfDiffusion runs

Options

  • --mpnn-model-name

    • MPNN model name to use

    • default = v_48_020

  • --mpnn-model-source

    • MPNN model source [ soluble, original]

    • default = soluble

Outputs

  • outputs (directory)

    • Directory containing results of RfDiffusion and MPNN/AF2 runs.

Notes

Workflow

 

RFDiffusion Contigs

The contigs flags are discussed at length in the RFdiffusion repository README

Now, what does 'contigmap.contigs=[150-150]' mean? To those who have used RFjoint inpainting, this might look familiar, but a little bit different. Diffusion, in fact, uses the identical 'contig mapper' as inpainting, except that, because we're using hydra, we have to give this to the model in a different way. The contig string has to be passed as a single-item in a list, rather than as a string, for hydra reasons and the entire argument MUST be enclosed in '' so that the commandline does not attempt to parse any of the special characters.

The contig string allows you to specify a length range, but here, we just want a protein of 150aa in length, so you just specify [150-150] This will then run 10 diffusion trajectories, saving the outputs to your specified output folder.

In more detail, if we want to scaffold a motif, the input is just like RFjoint Inpainting, except needing to navigate the hydra config input. If we want to scaffold residues 10-25 on chain A a pdb, this would be done with 'contigmap.contigs=[5-15/A10-25/30-40]'. This asks RFdiffusion to build 5-15 residues (randomly sampled at each inference cycle) N-terminally of A10-25 from the input pdb, followed by 30-40 residues (again, randomly sampled) to its C-terminus.

References

Broadly applicable and accurate protein design by integrating prediction networks and diffusion generative models