Epitope Scan
Please note that NetMHCPan and NetMHCIIPan modes are only available for users with a valid DTU Health Tech license agreement.
The epitope scan API runs the Rosetta MHC II epitope prediction algorithm, as well as the NetMHCPan and NetMHCIIPan prediction algorithms (provided that the user has a license to use those tools) on an input protein structure or sequence. This API uses a machine learning model to predict epitopes based entirely on the sequence of the protein. Structure input is provided as a convenience but the API will produce identical results regardless of whether a sequence or structure is used as input.
Currently, API outputs and some inputs are specific to the algorithm chosen for prediction.
Further background information and advice on interpreting the results of the epitope scan tool can be found in the Cyrus Bench Documentation
- 1 Quickstart
- 2 Inputs
- 3 Options
- 4 Outputs
- 5 Notes
Quickstart
Command Line Examples
Run Rosetta epitope scan on an input sequence:
cyrus engine submit epitope-scan NLYIQWLKDGGPSSGRPPPS --allele-list-file alleles.txt
Run epitope scan on an input pdb using NetMHCPan:
cyrus engine submit epitope-scan input.pdb --mode=netmhcpan --allele-list-file alleles.txt
Run NetMHCIIPan on an input pdb using only the alleles in the specified file. Alleles should be listed 1 per line:
cyrus engine submit epitope-scan input.pdb --mode=netmhciipan --allele-list-file alleles.txt
Run NetMHCIIPan on an input FASTA with default allele list:
Python Examples
When using the python library, the alleles you are interested in must be specified explicitly from the list in the introduction of this document. This behavior differs from the command line client, which defaults to searching for all alleles
Run epitope scan on an input sequence :
Run epitope scan on an input pdb :
Inputs
You must specify one of a PDB file, a sequence, or a FASTA file.
--allele-list-file
File containing a list of MHC Class I or II allele names, 1 per line
See Supported Alleles for correct naming conventions/syntax for each API mode
See Default Allele Lists for mode specific default alleles
--pbd-file
(str)Input PDB file – a PDB file
CLI argument:
--pdb-file input.pdb
Python submit() argument:
pdb_path="input.pdb"
Do not include nonprotein residues.
Do not include multimodel (NMR-sourced) PDBs.
--sequence
(str)Sequence – a protein sequence
CLI Arguments:
--sequence NLYIQWLKDGGPSSGRPPPS
Python submit() argument:
sequence=”NLYIQWLKDGGPSSGRPPPS”
--fasta-file
(stringArray)A FASTA file with one or more sequences, or multiple fasta files
CLI argument:
--fasta-file sequence.fasta
--fasta-file *.fasta
--fasta-file fastas.zip
Python argument:
fasta_file="input.fasta"
When using fasta file input, you. must strictly follow the guidelines described here: https://services.healthtech.dtu.dk/examples/example.fasta.html
all sequences must have a header with a name for the sequence
there must be no space between the > character starting the header
the file can have no blank lines
Options
--mode
Prediction model to use, must be one of
rosetta
,netmhcpan
ornetmhciipan
default =
rosetta
--native-sequence
A native protein sequence
If supplied, epitope-scan will be run on it as well, and the raw results as well as a delta file will be added to the outputs.
CLI Arguments:
--native-sequence NLYIQWLKDGGPSSGRPPPS
Python submit() argument:
native_sequence=”NLYIQWLKDGGPSSGRPPPS”
--weak-binder-threshold
The percentile cutoff for distinguishing weak and strong binders in NetMHCPan and NetMHCIIPan
default = 5
Outputs
Rosetta Mode Epitope Scan Outputs
The API with --mode=rosetta
returns a CSV file (rosetta_epitope_scan.csv
) with the following fields:
begin_seqpos
- start of the sequence window involved in the predictionepitope_seq
- sequence of the epitope involved in the predictionallele
- The MHC allele binding affinity is being predicted forIC50_nM
- The predicted IC50 in nanomolarityrank_percentage
- Primary epitope score metric. The epitope is in the top n% of binders measured against random background. Lower number is more likely epitope for that allele.score
- The raw score of the prediction model, lower is better. Used to calculate the primary normalized score metric, rank_percentagegenome_sequence
- Is the epitope in the human reference genomeknown
- Does the sequence exist in the IEDB as a known T-cell activating epitope
NetMHCIIPan Mode Epitope Scan Outputs
The API with --mode=netmhciipan
returns a TSV file (NetMHCIIPan_results.tsv
) with the following fields:
Pos
- starting position in sequence of peptide windowPeptide
- 15mer peptideID
- sequence IDWeighted_NB
- binding score weighted by allele population weights (for predicted binding alleles)For every allele (
<allele>
) provided in--allele-list-file
or in default allele sets:<allele>-Core
- predicted core binding register<allele>-Score
- Eluted ligand prediction score<allele>-Rank
- percentile rank of eluted ligand prediction score<allele>-Score_BA
- predicted binding affinity in log-scale<allele>-nM
- predicted binding affinity in nanomolar IC50)<allele>-Rank_BA
- percentile rank of predicted affinity compared to a set of 100,000 random natural peptides
NetMHCPan Mode Epitope Scan Outputs
The API with --mode=netmhcpan
returns a TSV file (NetMHCPan_results.tsv
) with the following fields:
Pos
- residue number of peptide in protein sequence (starts from 0)Peptide
- 11mer peptideID
- sequence IDFor every allele provided in
--allele-list-file
:core
- predicted 9mer binding coreicore
- interaction core (sequence of binding core including eventual insertions/deletions)EL-score
- raw prediction scoreEL_Rank
- rank of predicted binding score compared to a set of random natural peptides
Outputs with --native-sequence
delta_results.csv
If
--native-sequence
is provided the API will return the results for both the input sequence and native sequence provided, as well as a file nameddelta_results.csv
.Contains the scores of the design input minus the scores of the native input.
Notes
NetMHCIIPan Eluted Ligand (EL) vs. Binding Affinity (BA)
NetMHCIIPan has two modes: EL and BA. By default, NetMHCII only runs in EL mode, but the Cyrus API has activated the flag to output results from both modes.
EL (eluted ligand) data is the result of a peptide being naturally processed and eluted from the MHC complex (so binding IC50 not possible; but does tell you that it bound)
BA (binding affinity) data is the measured IC50
EL contains a lot of data for self proteins
BA contains a lot of data for non-self proteins (bacteria, viruses)
BA predicts if a peptide will bind whereas EL tells us whether a peptide is likely to be naturally processed (and indirectly that it bound)
EL can gain information from varying lengths of peptides - which is good for MHCII
“Pan” means the network is universal (so no need to train individual networks)
NetMHCIIPan 4.1 : “The output of the model is a prediction score for the likelihood of a peptide to be naturally presented by [an] MHC II receptor of choice. The output also includes %rank score, which normalizes prediction score by comparing to prediction of a set of random peptides. Optionally, the model also outputs BA prediction and %rank scores.
Consider:
EL by nature will “miss” epitopes (like MAPPs it binds but may not be detected)
BA will overpredict (not all binders will be immunogenic)
Key Points
BA models are trained on binding affinity data and reports predicting binding
EL models are trained on eluted ligands (bound, presented to mhc, eluted) and reports whether a peptide is likely to be naturally processed
EL will under-predict; BA will over-predict
EL is best used for "key epitopes" (but may still miss some)
BA is best used for optimal coverage of possible epitopes
Interpreting epitope predictions
To replicate CAD-style "best practices", API users could:
for each epitope, calculate 'n_hits' = number of alleles where rank_percentage < 10
sort by n_hits
Supported Alleles
The epitope scan predicts the immunogenicity of the protein with respect to the following alleles:
Rosetta epitope Scan Allele List (MHC Class II):
NetMHCIIPan Allele List(MHC Class II):
NetMHCPan Allele List (MHC Class I):
Default Allele Lists
If you do not specify an allele list when using either Rosetta Epitope Scan or NetMHCIIPan, A default set of alleles will be used. The default allele lists are below. There is no default allele list for NetMHCPan.
Rosetta Default Allele List:
NetMHCIIPan Default Allele List: