BLAST search

Database and Query

Please select database.

Program blastn (DNA query vs. DNA database)
Database

Please enter the query sequence data.

File Upload
or COPY & PASTE

BLAST search options

If you want to run the BLAST search by default, these parameters need not be changed.

Max target sequences
Expect threshold
Word size
Match/Mismatch Scores
Gap costs
Filter
Mask
   Now loading


HELP: BLAST search main options

  • Expect
  • This setting specifies the statistical significance threshold for reporting matches against database sequences. The default value (10) means that 10 such matches are expected to be found merely by chance, according to the stochastic model of Karlin and Altschul (1990). If the statistical significance ascribed to a match is greater than the EXPECT threshold, the match will not be reported. Lower EXPECT thresholds are more stringent, leading to fewer chance matches being reported.

  • Word-size
  • BLAST is a heuristic that works by finding word-matches between the query and database sequences. One may think of this process as finding "hot-spots" that BLAST can then use to initiate extensions that might eventually lead to full-blown alignments. For nucleotide-nucleotide searches (i.e., "blastn") an exact match of the entire word is required before an extension is initiated, so that one normally regulates the sensitivity and speed of the search by increasing or decreasing the word-size. For other BLAST searches non-exact word matches are taken into account based upon the similarity between words. The amount of similarity can be varied so one normally uses just the word-sizes 2 and 3 for these searches.

  • Reward and Penalty for Nucleotide Programs
  • Many nucleotide searches use a simple scoring system that consists of a "reward" for a match and a "penalty" for a mismatch. The (absolute) reward/penalty ratio should be increased as one looks at more divergent sequences. A ratio of 0.33 (1/-3) is appropriate for sequences that are about 99% conserved; a ratio of 0.5 (1/-2) is best for sequences that are 95% conserved; a ratio of about one (1/-1) is best for sequences that are 75% conserved. [States DJ, Gish W, and Altschul SF (1991) METHODS: A companion to Methods in Enzymology 3:66-70.]

  • Gap Cost
  • Increasing the Gap Costs will result in alignments which decrease the number of Gaps introduced. The presence of a gap is more significant than the length of the gap. Therefore gap existence costs are higher than extension costs.

  • Filter (Low-complexity)
  • The server filters your query sequence for low compositional complexity regions. Low complexity regions commonly give spuriously high scores that reflect compositional bias rather than significant position-by- position alignment. Filtering can elminate these potentially confounding matches (e.g., hits against proline-rich regions or poly-A tails) from the blast reports, leaving regions whose blast statistics reflect the specificity of their pairwise alignment. Queries searched with the blastn program are filtered with DUST. Other programs use SEG. Low complexity sequence found by a filter program is substituted using the letter "N" in nucleotide sequence (e.g., "NNNNNNNNNNNNN") and the letter "X" in protein sequences (e.g., "XXXXXXXXX").

  • Mask Lower Case
  • With this option selected you can cut and paste a FASTA sequence in upper case characters and denote areas you would like filtered with lower case. This allows you to customize what is filtered from the sequence during the comparison to the BLAST databases.