In order to improve the success of and confidence in designed glycosylation sites, Cyrus has implemented glyco-predictor, an in-house version of DeepNGlyPred (DNGP)-- a deep neural-network (DNN) learning tool for sequence-based human N-linked glycosylation prediction.

Quickstart

Run predictions on given FASTA sequence

cyrus engine submit glyco-predictor input.fasta

The API takes approximately an hour to run and generates a single report in the form of a CSV file listing report of N-linked glycosylation predictions.

Inputs

--fasta-file (str)
- Input FASTA file of sequence of interest

Outputs

dngp-report.csv
- CSV file containing positions of glycosylation motifs within input sequence and prediction of glycosylation propensity.

Notes

Feature selection

The best features in predicting glycosylation were structure-based, determined by NetSurfP-3.0. Optimal window size for NetSurfP-3.0 predictions was dependent on training sets (N-GlycositeAtlas = 41, N-GlyDE = 25). Window size is defined as the number of flanking residues surrounding (and including) central asparagine of the glycosylation motif (N-X-[S/T]).

In order to best replicate structural features encoded in DNGP models, internal development of NetSurfP-3.0 was necessary since pre-trained models are not readily available. With publicly available training data, an internal NetSurfP-3.0 model was implemented for workflow use; however, it is important to note that the original DNGP models were trained on data generated from NetSurfP-2.0. The main difference between versions lies in underlying architecture for optimized speed (NetSurfP-3.0 utilizes an ESM model to improve runtime performance), other performance metrics were reported to have no significant differences.

Performance

Table 1 summarizes the accuracy, precision, recall and specificity of the Cyrus glyco-predictor tool for different benchmarks. The N-GlyDE dataset consisted of 167 positive glycosylation sites and 280 negative sites, and were not included in any of the model training sets. Benchmarks 1 and 2 include internal results from glycoproteomic analysis provided by mass spectrometry results of two proteins for evaluating glycosylation of 10 and 8 NX-[S/T] sites respectively.

On average, glyco-predictor achieved 74.8% accuracy, 75.6% precision, 74.5% recall, and 80.0% specificity. While there may be potential in enhancing this model, these benchmarks reflect acceptable performance for predicting human N-linked glycosylation.

Benchmark	Accuracy (%)	Precision (%)	Recall (%)	Specificity (%)
N-GlyDE dataset	79.4	67.0	88.6	73.9
Benchmark 1	70	60	75	66
Benchmark 2	75	100	60	100
Average	74.8	75.6	74.5	80.0

Table 1. Performance of Cyrus glyco-predictor tool on different benchmarks.

References

DeepNGlyPred: A Deep Neural Network-Based Approach for Human N-Linked Glycosylation Site Prediction

DeepNGlyPred Github Repository

NetSurfP-3.0 Github Repository

Cyrus Glycosylation Predictor

Glycosylation Prediction