PCCCB Flavored BLAST

The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.


BLAST Programs

blastn
compares a nucleotide query sequence against a nucleotide sequence database.

blastp
compares an amino acid query sequence against a protein sequence database.

blastx
compares a nucleotide query sequence translated in all reading frames against a protein sequence database.

tblastn
compares a protein query sequence against a nucleotide sequence database dynamically translated in all reading frames.

tblastx
compares the six-frame translations of a nucleotide query sequence against the six-frame translations of a nucleotide sequence database.

makeblastdb - BLAST database
The makeblastdb application produces BLAST databases from FASTA files. It is possible to use completely unstructured (or even blank) FASTA definition lines, but this is not the recommended procedure. Assigning a unique identifier to every sequence in the database allows you to retrieve the sequence by identifier and allows you to associate every sequence with a taxonomic node (through the taxid of the sequence). The unique identifier can be a simple string (as in the example below) or could be actual accession of the sequence if the sequence comes from a public database (e.g., GenBank). Being able to associate a database sequence with a taxonomic node is especially powerful for the version 5 databases that BLAST can use to limit the search by taxonomy. The identifier should begin right after the “>” sign on the definition line and contain no spaces and the -parse_seqids flag should be used.


BLAST Search Parameters

Sequence Format
Fasta: A common sequence format offered by most sequence and alignment editors.
Format: A 'greater than' symbol followed by the sequence name. The following line is the sequence.
Example:
>RL228e1
AGAAGAAGAGGTAGTAATTAGATCTGACAATTTCACGGACAATGCTAAAACTATAATAGTACAGCTGAAAGAACCTGT AGAAATTAATTGTACAAGACCCCACAACAATACAAGAAGAAGGATAAGTATAGGACCAGGGAGAGCATTTTATGCAAC

Expect
The statistical significance threshold for reporting matches against database sequences; the default value is 10, such that 10 matches are expected to be found merely by chance, according to the stochastic model of Karlin and Altschul (1990). If the statistical significance ascribed to a match is greater than the EXPECT threshold, the match will not be reported. Lower EXPECT thresholds are more stringent, leading to fewer chance matches being reported. Fractional values are acceptable.

Word size
Word size for wordfinder algorithm (length of initial exact match).

Max target sequences
Maximum number of descriptions and alignments to keep. The default value is 50.

Match/Mismatch scores
For BLASTN, the scores are reward for a nucleotide match and penalty for a nucleotide mismatch.

Matrix
Scoring matrix name. The default matrix is BLOSUM62.

Gap costs
Costs to open and extend a gap.

Filter (Low-complexity)
Mask off segments of the query sequence that have low compositional complexity, as determined by the SEG program of Wootton & Federhen (Computers and Chemistry, 1993) or, for BLASTN, by the DUST program of Tatusov and Lipman (in preparation). Filtering can eliminate statistically significant but biologically uninteresting reports from the blast output (e.g., hits against common acidic-, basic- or proline-rich regions), leaving the more biologically interesting regions of the query sequence available for specific matching against database sequences. Filtering is only applied to the query sequence (or its translation products), not to database sequences. Default filtering is DUST for BLASTN, SEG for other programs. It is not unusual for nothing at all to be masked by SEG, when applied to sequences in SWISS-PROT, so filtering should not be expected to always yield an effect. Furthermore, in some cases, sequences are masked in their entirety, indicating that the statistical significance of any matches reported against the unfiltered query sequence should be suspect.

Mask for lookup table only
This option masks only for purposes of constructing the lookup table used by BLAST. The BLAST extensions are performed without masking.

Mask for lower case letters
Choose to use lower case filtering in query and subject sequence(s).

Alignment
Choose to perform ungapped alignment.

Alignment output format
Standard BLAST alignment in pairs of query sequence and database match. For nucleotide, the matches are marked by a pipe symbol ("|") in between query and database sequence. For protein, the identical matches are marked by letter code with "homologous" substitutions (determined by the scoring matrix used) marked by "+" symbol in a line between the query and the database sequence.

Pairwise
The databases alignments are anchored (shown in relation to) to the query sequence in pairwised fashion.

Query-anchored with identities
The databases alignments are anchored (shown in relation to) to the query sequence. Identities are displayed as dots (.), with mismatches displayed as single letter abbreviations.

Query-anchored without identities
Identities are shown as single letter nucleotide abbreviations.

Flat Query-anchored with identities
The 'flat' display shows inserts as deletions on the query. Identities are displayed as dots (.), with mismatches displayed as single letter abbreviations.

Flat Query-anchored without identities
The 'flat' display shows inserts as deletions on the query. Identities are shown as single letter abbreviations.

XML Blast output

Tabular
Simple output with different aligment information separated according to tab delimited fields with field headers are displayed at the top.

Tabular with comment lines

Query Genetic Code
Genetic code to be used to translate query.

Query Genetic Code
Genetic code to be used to translate database.

Other parameters
Users can set other more parameters to perform BLAST search.

Descriptions
Restricts the number of short descriptions of matching sequences reported to the number specified; default limit is 50 descriptions. See also Expect.

Alignments
Restricts database sequences to the number specified for which high-scoring segment pairs (HSPs) are reported; the default limit is 50. If more database sequences than this happen to satisfy the statistical significance threshold for reporting (see Expect), only the matches ascribed the greatest statistical significance are reported.