The BLAST tools API runs standalone BLAST+ executables with v5 NCBI databases for design related queries. Three modes are available: nopssm
, pssm
, and patent
for running standard BLAST sequence alignments and patent queries respectively. The nopssm
and pssm
modes run standard BLAST+ against the NR database; use the latter if you wish to generate a PSSM for your query. The patent
mode runs BLAST+ against the curated NCBI patent protein sequence database (pataa
) that is generated in partnership with the USPTO and will return specific patent sequence hits along with the standard alignments. See the blast-tools
repository for a more customizable setup using Docker.
Table of Contents |
---|
Quick Start
Generate only sequence alignments for input.fasta
:
Code Block |
---|
cyrus engine submit blast input.fasta --mode nopssm |
Generate sequence alignments and PSSM for input.fasta
:
Code Block |
---|
cyrus engine submit blast input.fasta --mode pssm |
Generate sequence alignments and patent information for input.fasta
:
Code Block |
---|
cyrus engine submit blast input.fasta --mode patent |
Generate sequence alignments and return up to 1000 hits for input.fasta
:
Code Block |
---|
cyrus engine submit blast input.fasta --mode nopssm --max-target-sequences 1000 |
Inputs
FASTA file containing query sequence of interest.
Options
--fasta-file
:Input FASTA file with query sequence
--mode
:nopssm
- run BLAST+ (blastp) for sequence alignments using NR databasepssm
- run BLAST+ (psiblast) for sequence alignments and PSSM generation using NR databasepatent
- run BLAST+ (blastp) for sequence alignments and patent hits using PATAA database
--max-target-sequences
: sets the maximum number of sequences that can be returned for a query (, default = 500). Note, there is a risk of diluting PSSM statistics if this maximum is set too high.
Outputs
Some outputs depend on the mode you choose to run.
Mode | Filename | Description |
---|---|---|
| query.out | BLASTP or PSIBLAST query alignments (Note: formatting differs between BLASTP and PSIBLAST - PSIBLAST is set to run 4 iterative rounds and data from each round is logged to this file) |
| query.entries | Accession IDs of query hits are parsed from query.out file for gathering the full sequences and descriptions for the full-query.fasta file |
| full-query.fasta | FASTA file with complete sequences and descriptions from query hits |
| query.chk | PSSM checkpoint file generated by PSIBLAST |
| query.pssm | PSSM from PSIBLAST query in NCBI formatting |
| query.patents | Textfile listing NCBI accession codes and patent descriptions for each query hit (ex. “ADA00576.1 Sequence 10 from patent US 7595057”) |
Workflow
...