Versions Compared

Key

  • This line was added.
  • This line was removed.
  • Formatting was changed.

Sequence PTM Prediction 

Sequence based predictions follow industry standards in flagging known PTM motifs. Input sequences are scanned for canonical motifs for deamidation, isomerization, and glycosylation. 

Given the motif hierarchy for deamidation and isomerization, weights are applied accordingly to assign a score for a relative metric of occurrence. Oxidation is not included due to known structural parameters needed for modification.

...

Cyrus predictive models for Asn deamidation and Met photooxidation match performance observed in literature.

Baseline model performance was reproduced and benchmarked using available supplemental training and validation data.

Alternative feature sets were tested and compared to baseline models to better target predictors of importance during feature selection. 

  • Rosetta generated structural data was substituted in training and test data using available crystal structures [Deamidation only]

  • Feature sets were reduced based on top predictors described in literature and observed in model optimization [Deamidation and oxidation]

Running PTM prediction

PTM prediction can be run with the following command:

cyrus run ptm-prediction input.fasta

The job takes as input either a FASTA or PDB file

Flags:

Code Block
languagenone
"--offset" adjust output residue numbering from pose to original numbering scheme
"--raw" output raw prediction data for each PTM

...

FASTA inputs will return reports of motif hits with scores based on degradation propensity using motif hierarchy.

...

The post-translational modification (PTM) predictor API paces industry standards for in silico liability predictions for therapeutic development. Maintaining therapeutic protein stability and potency during development continues to be a costly and significant challenge due to degradation by PTMs. Early screening for liabilities is critical in reducing development costs and enabling downstream success. The ptm-predictor API provides both sequence and structure based predictions (See Notes for more information).

Currently, the ptm-predictor API provides predictions for:

  • Asparagine Deamidation

  • Aspartic Acid Isomerization

  • Methionine Oxidation

  • Hyper-reactive Cysteines

  • N-Linked Glycosylation

  • Lysine Glycation

  • Pyroglutamylation

  • N-Term Cyclization

  • C-Term Lysine Processing

  • For more sophisticated N-linked glycosylation prediction, use our Glycosylation Prediction API

  • While the reported findings are trained and informed by experimental data and observations found in literature, recommendations provided should supplement rather than replace domain expertise and insights of the user.

Info

ptm-prediction is in Beta – future versions aim to further optimize models and reporting methods

Table of Contents
minLevel1
maxLevel6
outlinefalse
typelist
printablefalse

Quickstart

Flag known PTM motifs for a given sequence

Code Block
cyrus engine submit ptm-prediction --fasta-file input.fasta 

Predict PTMs for multiple fasta files

Code Block
cyrus engine submit ptm-prediction --fasta-file *.fasta

Predict PTMs for a given structure and return results with residue number offset by 12

Code Block
cyrus engine submit ptm-prediction --pdb-file input.pdb --offset 12

Predict PTMS for multiple structures

Code Block
cyrus engine submit ptm-prediction --pdb-file input1.pdb input2.pdb

Inputs

Submitting a ptm-prediction job requires either FASTA or PDB inputs (not both). Results for FASTA inputs will not include oxidation predictions since this requires structural features for prediction. Results for PDB inputs will include both sequence and structure based predictions.

Sequence PTM Prediction

  • --fasta-file (str)

    • FASTA file(s) containing sequence(s) of interest

    • One or more FASTA files may be submitted

      • *.fasta or input1.fasta input2.fasta ...

      • When multiple FASTA files are provided, they are combined into one FASTA file for submission; therefore it is important to properly label input sequences using the FASTA header lines.

Structural PTM Prediction

  • --pdb-file (str)

    • Input PDB file(s) for predictions

    • One or more PDB files may be submitted.

    • Ideally the PDB is cleaned using the Clean PDB API and doesn’t not have regions of missing density as this would impact the quality of results

Options

  • --offset (int64)

    • During feature extraction, the API automatically considers sequences and structures to start amino acid residue numbering at 1. Results are reported with this numbering by default. Sometimes it is desirable to renumber residue numbers to a specific numbering scheme.

    • Providing an integer N for this option will renumber residue numbers in output reports by N

    • Adjusts output residue numbering to original numbering scheme by offset provided

    • The API will automatically convert a given input PDB to sequential numbering (first residue starts at 1) internally to extract necessary features for predictions.

Outputs

All output files (listed below) will be returned as a tarball output.tgz.

  • ptm_report.csv

    • CSV file containing the following columns:

      • ID - sequence or structure ID

      • ptm - PTM being reported

      • N - number of residues or motifs predicted from sequence/structure

      • hits - List of residue or motif numbers for predicted PTM

        • For Asn deamidation and Asp isomerization, hits report the specific motif given experimental data attributing the N+1 residue to risk of PTM liability. For example, NG_67 indicates that the asparagine at position 67 is at risk of deamidation, and the N+1 residue at that position is a glycine. (See Notes for more details)

  • ptm_report_<ID>.md

    • Markdown file containing a formalized report on the predicted PTMs for each sequence/structure (identified by ID).

    • Markdown files can be visualized in most IDEs or converted to a preferred text format by the user.

    • The report contains a summary of the predicted hits, and more detailed descriptions for each PTM including background on how it may manifest, potential mitigation strategies, and details on how it was predicted.

  • <ID>_ptm_report.pml - only for PDB inputs

    • When input structures are provided for ptm-prediction, predicted labile residues will be mapped onto the structures using an automatically generated PyMOL script.

    • The script will color code residues as sticks and create individual scenes for each PTM predicted.

    • See Notes for more details

  • *pdb - only for PDB inputs

    • For convenience, input PDBs will be included in the output data packet so that the generated PyMOL script will work directly in the output directory without concern for local paths to input PDB files.

Notes

Visualizing Liable Residues in PyMOL

Requires updated and valid PyMOL license to run

  • The PTM predictor API, when running the structural PTM predictions, will automatically generate a .pml script to visualize predicted liable residues on the input structure in PyMOL.

  • To visualize the predicted PTM residues, run the following command:

Code Block
pymol <ID>_ptm_report.pml
  • Residues will be color coded and shown as sticks in separate scenes, for example:

    • Deamidation: color = cyan (util.cbac), selection/scene =deamidation_pred

    • Oxidation:color = magenta (util.cbam), selection/scene = oxidation_pred

  • Upon starting the session, the current view of the structure would be of all predictions (scene = all_predictions)

  • Clicking on a specific PTM scene would change views to only show residues of that specific PTM

  • See References for a tutorial on running PyMOL

Asn Deamidation and Asp Isomerization

  • Asparagine deamidation occurs in three potential pathways:

    • Nucleophilic attack by backbone carbonyl group

    • Nucleophilic attack on backbone nitrogen of N+1 residue

    • Direct hydrolysis

  • Canonical motifs for deamidation are NG, NS, NN and NH in order of high to low deamidation rate.

  • Aspartic acid isomerization occurs through a related pathway to Asn deamidation, but typically occurs at higher rates at low pH

  • Canonical motifs for isomerization are DG, DS, DD, DT, and DH in order of high to low isomerization rate

  • Both deamidation and isomerization share the same structural attributes for prediction include N+1 residue, solvent accessibility, dihedral angles and nucleophilic attack distance

Detection of Asp isomerization by mass spectrometry is challenging due to the same molecular mass of IsoAsp compared to Asp resulting in limited available experimental data. Therefore, the API for isomerization prediction is currently limited to sequence based flagging.

References

Running PyMOL Scripts