/

Tolerance Identification

Tolerance Identification

Owned by Ben Baker

Last updated: Nov 09, 2023 by Aaron Aguhob

1 min read

The Tolerance Identification API finds the top N closest (by Blosum62) N-mers in the human genome against a given protein of sequence.

1 Quickstart
2 Inputs
3 Options
4 Outputs
5 Notes
- 5.1 Input proteome

Quickstart

Get the top 10 closest 9-mers:

cyrus engine submit tolerance-identification NLYIQWLKDGGPSSGRPPPS --top-n 10

Get the top 5 closet 9 and 15-mers:

cyrus engine submit tolerance-identification NLYIQWLKDGGPSSGRPPPS --top-n 5 --nmer-sizes 9,15

Inputs

--sequence (str)
- Input protein of sequence to compare against

Options

--top-n (int)
- Collect the top N matches
- default = 20
--nmer-sizes
- Nmer size(s) to run this on (Comma separated string ex: 9,10,11,12)
- default = 9

Outputs

out.csv
- CSV file containing the following columns
  - nmer_size - size of this nmer
  - resnum - residue number (1 indexed) of the nmer position in the query sequence
  - query_seq - query sequence
  - matchrank - Rank (0=best, N = worst ) out of the top-N closest (by blosum62) nmers to the query
  - matchscore - blosum62 score of the result to the query sequence
  - matchseq - the found human genome sequence
  - matchscore/max_score - matchscore divided by the score of a 100% (normalized Blosum62)
out.json
- JSON format of the out.csv

Notes

Running this protocol takes between 4 and 5 GB of memory per CPU

Input proteome

The input proteome file was taken from https://ftp.ensembl.org/pub/release-110/fasta/homo_sapiens/pep/Homo_sapiens.GRCh38.pep.abinitio.fa.gz

Related content

API Services

Cyrus Engine Public API

More like this

Epitope Scan

Cyrus Engine Public API

More like this

BLAST

Cyrus Engine Public API

More like this

Single Chain HM

Single Chain HM

Cyrus Engine Public API

More like this

Antibody HM

Cyrus Engine Public API

More like this

Template Predictor

Template Predictor

Cyrus Engine Public API

More like this