Epitope Scan

Please note that NetMHCPan and NetMHCIIPan modes are only available for users with a valid DTU Health Tech license agreement.

The epitope scan API runs the Rosetta MHC II epitope prediction algorithm, as well as the NetMHCPan and NetMHCIIPan prediction algorithms (provided that the user has a license to use those tools) on an input protein structure or sequence. This API uses a machine learning model to predict epitopes based entirely on the sequence of the protein.  Structure input is provided as a convenience but the API will produce identical results regardless of whether a sequence or structure is used as input.

Currently, API outputs and some inputs are specific to the algorithm chosen for prediction.

Further background information and advice on interpreting the results of the epitope scan tool can be found in the Cyrus Bench Documentation

Quickstart

Command Line Examples

Run Rosetta epitope scan on an input sequence:

cyrus engine submit epitope-scan NLYIQWLKDGGPSSGRPPPS --allele-list-file alleles.txt

Run epitope scan on an input pdb using NetMHCPan:

cyrus engine submit epitope-scan input.pdb --mode=netmhcpan --allele-list-file alleles.txt

Run NetMHCIIPan on an input pdb using only the alleles in the specified file. Alleles should be listed 1 per line:

cyrus engine submit epitope-scan input.pdb --mode=netmhciipan --allele-list-file alleles.txt

Run NetMHCIIPan on an input FASTA with default allele list:

Python Examples

When using the python library, the alleles you are interested in must be specified explicitly from the list in the introduction of this document. This behavior differs from the command line client, which defaults to searching for all alleles

Run epitope scan on an input sequence :

Run epitope scan on an input pdb :

Inputs

You must specify one of a PDB file, a sequence, or a FASTA file.

  • --allele-list-file

    • File containing a list of MHC Class I or II allele names, 1 per line

    • See Supported Alleles for correct naming conventions/syntax for each API mode

    • See Default Allele Lists for mode specific default alleles

  • --pbd-file (str)

    • Input PDB file – a PDB file

    • CLI argument: --pdb-file input.pdb

    • Python submit() argument: pdb_path="input.pdb"

    • Do not include nonprotein residues.

    • Do not include multimodel (NMR-sourced) PDBs.  

  • --sequence (str)

    • Sequence – a protein sequence

    • CLI Arguments: --sequence NLYIQWLKDGGPSSGRPPPS

    • Python submit() argument: sequence=”NLYIQWLKDGGPSSGRPPPS”

  • --fasta-file (stringArray)

    • A FASTA file with one or more sequences, or multiple fasta files

    • CLI argument: --fasta-file sequence.fasta

      • --fasta-file *.fasta

      • --fasta-file fastas.zip

    • Python argument: fasta_file="input.fasta"

When using fasta file input, you. must strictly follow the guidelines described here: https://services.healthtech.dtu.dk/examples/example.fasta.html

  • all sequences must have a header with a name for the sequence

  • there must be no space between the > character starting the header

  • the file can have no blank lines

Options

  • --mode

    • Prediction model to use, must be one of rosetta, netmhcpan or netmhciipan

    • default = rosetta

  • --native-sequence

    • A native protein sequence

    • If supplied, epitope-scan will be run on it as well, and the raw results as well as a delta file will be added to the outputs.

    • CLI Arguments: --native-sequence NLYIQWLKDGGPSSGRPPPS

    • Python submit() argument: native_sequence=”NLYIQWLKDGGPSSGRPPPS”

  • --weak-binder-threshold

    • The percentile cutoff for distinguishing weak and strong binders in NetMHCPan and NetMHCIIPan

    • default = 5

Outputs

Rosetta Mode Epitope Scan Outputs

The API with --mode=rosetta returns a CSV file (rosetta_epitope_scan.csv) with the following fields:

  • begin_seqpos - start of the sequence window involved in the prediction

  • epitope_seq - sequence of the epitope involved in the prediction

  • allele - The MHC allele binding affinity is being predicted for

  • IC50_nM - The predicted IC50 in nanomolarity

  • rank_percentage - Primary epitope score metric. The epitope is in the top n% of binders measured against random background. Lower number is more likely epitope for that allele.

  • score - The raw score of the prediction model, lower is better. Used to calculate the primary normalized score metric, rank_percentage

  • genome_sequence - Is the epitope in the human reference genome

  • known - Does the sequence exist in the IEDB as a known T-cell activating epitope

NetMHCIIPan Mode Epitope Scan Outputs

The API with --mode=netmhciipan returns a TSV file (NetMHCIIPan_results.tsv) with the following fields:

  • Pos - starting position in sequence of peptide window

  • Peptide - 15mer peptide

  • ID - sequence ID

  • Weighted_NB - binding score weighted by allele population weights (for predicted binding alleles)

  • For every allele (<allele>) provided in --allele-list-file or in default allele sets:

    • <allele>-Core - predicted core binding register

    • <allele>-Score - Eluted ligand prediction score

    • <allele>-Rank - percentile rank of eluted ligand prediction score

    • <allele>-Score_BA - predicted binding affinity in log-scale

    • <allele>-nM - predicted binding affinity in nanomolar IC50)

    • <allele>-Rank_BA - percentile rank of predicted affinity compared to a set of 100,000 random natural peptides

NetMHCPan Mode Epitope Scan Outputs

The API with --mode=netmhcpan returns a TSV file (NetMHCPan_results.tsv) with the following fields:

  • Pos - residue number of peptide in protein sequence (starts from 0)

  • Peptide - 11mer peptide

  • ID - sequence ID

  • For every allele provided in --allele-list-file:

    • core - predicted 9mer binding core

    • icore - interaction core (sequence of binding core including eventual insertions/deletions)

    • EL-score - raw prediction score

    • EL_Rank - rank of predicted binding score compared to a set of random natural peptides

Outputs with --native-sequence

  • delta_results.csv

    • If --native-sequence is provided the API will return the results for both the input sequence and native sequence provided, as well as a file named delta_results.csv.

    • Contains the scores of the design input minus the scores of the native input.

Notes

NetMHCIIPan Eluted Ligand (EL) vs. Binding Affinity (BA)

  • NetMHCIIPan has two modes: EL and BA. By default, NetMHCII only runs in EL mode, but the Cyrus API has activated the flag to output results from both modes.

  • EL (eluted ligand) data is the result of a peptide being naturally processed and eluted from the MHC complex (so binding IC50 not possible; but does tell you that it bound)

  • BA (binding affinity) data is the measured IC50

  • EL contains a lot of data for self proteins

  • BA contains a lot of data for non-self proteins (bacteria, viruses)

  • BA predicts if a peptide will bind whereas EL tells us whether a peptide is likely to be naturally processed (and indirectly that it bound)

  • EL can gain information from varying lengths of peptides - which is good for MHCII

  • “Pan” means the network is universal (so no need to train individual networks)

  • NetMHCIIPan 4.1 : “The output of the model is a prediction score for the likelihood of a peptide to be naturally presented by [an] MHC II receptor of choice. The output also includes %rank score, which normalizes prediction score by comparing to prediction of a set of random peptides. Optionally, the model also outputs BA prediction and %rank scores.

  • Consider:

    • EL by nature will “miss” epitopes (like MAPPs it binds but may not be detected)

    • BA will overpredict (not all binders will be immunogenic)

  • Key Points

    1. BA models are trained on binding affinity data and reports predicting binding

    2. EL models are trained on eluted ligands (bound, presented to mhc, eluted) and reports whether a peptide is likely to be naturally processed

    3. EL will under-predict; BA will over-predict

    4. EL is best used for "key epitopes" (but may still miss some)

    5. BA is best used for optimal coverage of possible epitopes

Interpreting epitope predictions

To replicate CAD-style "best practices", API users could:

  1. for each epitope, calculate 'n_hits' = number of alleles where rank_percentage < 10

  2. sort by n_hits

Supported Alleles

The epitope scan predicts the immunogenicity of the protein with respect to the following alleles:

Rosetta epitope Scan Allele List (MHC Class II):

NetMHCIIPan Allele List(MHC Class II):

NetMHCPan Allele List (MHC Class I):

 

Default Allele Lists

If you do not specify an allele list when using either Rosetta Epitope Scan or NetMHCIIPan, A default set of alleles will be used. The default allele lists are below. There is no default allele list for NetMHCPan.

Rosetta Default Allele List:

NetMHCIIPan Default Allele List: