Please note that NetMHCPan and NetMHCIIPan modes are only available for users with a valid DTU Health Tech license agreement. |
The epitope scan API runs the Rosetta MHC II epitope prediction algorithm, as well as the NetMHCPan and NetMHCIIPan prediction algorithms (provided that the user has a license to use those tools) on an input protein structure or sequence. This API uses a machine learning model to predict epitopes based entirely on the sequence of the protein. Structure input is provided as a convenience but the API will produce identical results regardless of whether a sequence or structure is used as input.
Currently, API outputs and some inputs are specific to the algorithm chosen for prediction.
Further background information and advice on interpreting the results of the epitope scan tool can be found in the Cyrus Bench Documentation
Run Rosetta epitope scan on an input sequence:
cyrus engine submit epitope-scan NLYIQWLKDGGPSSGRPPPS --allele-list-file alleles.txt |
Run epitope scan on an input pdb using NetMHCPan:
cyrus engine submit epitope-scan input.pdb --mode=netmhcpan --allele-list-file alleles.txt |
Run NetMHCIIPan on an input pdb using only the alleles in the specified file. Alleles should be listed 1 per line:
cyrus engine submit epitope-scan input.pdb --mode=netmhciipan --allele-list-file alleles.txt |
Run NetMHCIIPan on an input FASTA with default allele list:
cyrus engine submit epitope-scan --fasta-file input.fasta --mode=netmhciipan |
When using the python library, the alleles you are interested in must be specified explicitly from the list in the introduction of this document. This behavior differs from the command line client, which defaults to searching for all alleles |
Run epitope scan on an input sequence :
from engine.epitope_scan.client import EpitopeScanClient client = EpitopeScanClient() job_id = client.submit(pdb_path=None, sequence="NLYIQWLKDGGPSSGRPPPS",mhc_list=["H-2-IAb", "HLA-DRB10101"]) |
Run epitope scan on an input pdb :
from engine.epitope_scan.client import EpitopeScanClient client = EpitopeScanClient() job_id = client.submit(pdb_path="input.pdb",mhc_list=["H-2-IAb", "HLA-DRB10101"]) |
You must specify one of a PDB file, a sequence, or a FASTA file.
--allele-list-file
File containing a list of MHC Class I or II allele names, 1 per line
See Supported Alleles for correct naming conventions/syntax for each API mode
See Default Allele Lists for mode specific default alleles
--pbd-file
(str)
Input PDB file – a PDB file
CLI argument: --pdb-file input.pdb
Python submit() argument: pdb_path="input.pdb"
Do not include nonprotein residues.
Do not include multimodel (NMR-sourced) PDBs.
--sequence
(str)
Sequence – a protein sequence
CLI Arguments: --sequence NLYIQWLKDGGPSSGRPPPS
Python submit() argument: sequence=”NLYIQWLKDGGPSSGRPPPS”
--fasta-file
(stringArray)
A FASTA file with one or more sequences, or multiple fasta files
CLI argument: --fasta-file sequence.fasta
--fasta-file *.fasta
--fasta-file fastas.zip
Python argument: fasta_file="input.fasta"
When using fasta file input, you. must strictly follow the guidelines described here: https://services.healthtech.dtu.dk/examples/example.fasta.html
|
--mode
Prediction model to use, must be one of rosetta
, netmhcpan
or netmhciipan
default = rosetta
--native-sequence
A native protein sequence
If supplied, epitope-scan will be run on it as well, and the raw results as well as a delta file will be added to the outputs.
CLI Arguments: --native-sequence NLYIQWLKDGGPSSGRPPPS
Python submit() argument: native_sequence=”NLYIQWLKDGGPSSGRPPPS”
--weak-binder-threshold
The percentile cutoff for distinguishing weak and strong binders in NetMHCPan and NetMHCIIPan
default = 5
The API with --mode=rosetta
returns a CSV file (rosetta_epitope_scan.csv
) with the following fields:
begin_seqpos
- start of the sequence window involved in the prediction
epitope_seq
- sequence of the epitope involved in the prediction
allele
- The MHC allele binding affinity is being predicted for
IC50_nM
- The predicted IC50 in nanomolarity
rank_percentage
- Primary epitope score metric. The epitope is in the top n% of binders measured against random background. Lower number is more likely epitope for that allele.
score
- The raw score of the prediction model, lower is better. Used to calculate the primary normalized score metric, rank_percentage
genome_sequence
- Is the epitope in the human reference genome
known
- Does the sequence exist in the IEDB as a known T-cell activating epitope
The API with --mode=netmhciipan
returns a TSV file (NetMHCIIPan_results.tsv
) with the following fields:
Pos
- starting position in sequence of peptide window
Peptide
- 15mer peptide
ID
- sequence ID
Weighted_NB
- binding score weighted by allele population weights (for predicted binding alleles)
For every allele (<allele>
) provided in --allele-list-file
or in default allele sets:
<allele>-Core
- predicted core binding register
<allele>-Score
- Eluted ligand prediction score
<allele>-Rank
- percentile rank of eluted ligand prediction score
<allele>-Score_BA
- predicted binding affinity in log-scale
<allele>-nM
- predicted binding affinity in nanomolar IC50)
<allele>-Rank_BA
- percentile rank of predicted affinity compared to a set of 100,000 random natural peptides
The API with --mode=netmhcpan
returns a TSV file (NetMHCPan_results.tsv
) with the following fields:
Pos
- residue number of peptide in protein sequence (starts from 0)
Peptide
- 11mer peptide
ID
- sequence ID
For every allele provided in --allele-list-file
:
core
- predicted 9mer binding core
icore
- interaction core (sequence of binding core including eventual insertions/deletions)
EL-score
- raw prediction score
EL_Rank
- rank of predicted binding score compared to a set of random natural peptides
--native-sequence
delta_results.csv
If --native-sequence
is provided the API will return the results for both the input sequence and native sequence provided, as well as a file named delta_results.csv
.
Contains the scores of the design input minus the scores of the native input.
NetMHCIIPan has two modes: EL and BA. By default, NetMHCII only runs in EL mode, but the Cyrus API has activated the flag to output results from both modes.
EL (eluted ligand) data is the result of a peptide being naturally processed and eluted from the MHC complex (so binding IC50 not possible; but does tell you that it bound)
BA (binding affinity) data is the measured IC50
EL contains a lot of data for self proteins
BA contains a lot of data for non-self proteins (bacteria, viruses)
BA predicts if a peptide will bind whereas EL tells us whether a peptide is likely to be naturally processed (and indirectly that it bound)
EL can gain information from varying lengths of peptides - which is good for MHCII
“Pan” means the network is universal (so no need to train individual networks)
NetMHCIIPan 4.1 : “The output of the model is a prediction score for the likelihood of a peptide to be naturally presented by [an] MHC II receptor of choice. The output also includes %rank score, which normalizes prediction score by comparing to prediction of a set of random peptides. Optionally, the model also outputs BA prediction and %rank scores.
Consider:
EL by nature will “miss” epitopes (like MAPPs it binds but may not be detected)
BA will overpredict (not all binders will be immunogenic)
Key Points
BA models are trained on binding affinity data and reports predicting binding
EL models are trained on eluted ligands (bound, presented to mhc, eluted) and reports whether a peptide is likely to be naturally processed
EL will under-predict; BA will over-predict
EL is best used for "key epitopes" (but may still miss some)
BA is best used for optimal coverage of possible epitopes
To replicate CAD-style "best practices", API users could:
for each epitope, calculate 'n_hits' = number of alleles where rank_percentage < 10
sort by n_hits
The epitope scan predicts the immunogenicity of the protein with respect to the following alleles:
The Rosetta and NetMHCIIPan models use slightly different names for some alleles |
Rosetta epitope Scan Allele List (MHC Class II):
NetMHCIIPan Allele List(MHC Class II):
NetMHCPan Allele List (MHC Class I):
If you do not specify an allele list when using either Rosetta Epitope Scan or NetMHCIIPan, A default set of alleles will be used. The default allele lists are below. There is no default allele list for NetMHCPan.
Rosetta Default Allele List:
NetMHCIIPan Default Allele List: