[Biopython-dev] PolyA Sequence fails to BLAST?
Sebastian Bassi
sbassi at clubdelarazon.org
Fri Jun 5 14:01:42 EDT 2009
On Fri, Jun 5, 2009 at 2:48 PM, João Rodrigues<anaryin at gmail.com> wrote:
> Hello all, this is quite a general curiosity.
> I was trying my application and I was testing the case of a sequence not
> having matches in BLAST. I chose a long stretch of Alanines, randomly:
> AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
> Not finding this odd enough, because it says "Sequence not in FASTA format",
> I went to the BLAST server page and manually tried to run it. Same error.
That is because this is a "low complexity" region that in most cases
is maked (with X or N) before entering into a BLAST search.
Look here:
"Filter (Low-complexity)
Mask off segments of the query sequence that have low
compositional complexity, as determined by the SEG program of Wootton
& Federhen (Computers and Chemistry, 1993) or, for BLASTN, by the DUST
program of Tatusov and Lipman (in preparation). Filtering can
eliminate statistically significant but biologically uninteresting
reports from the blast output (e.g., hits against common acidic-,
basic- or proline-rich regions), leaving the more biologically
interesting regions of the query sequence available for specific
matching against database sequences.
Filtering is only applied to the query sequence (or its
translation products), not to database sequences. Default filtering is
DUST for BLASTN, SEG for other programs.
It is not unusual for nothing at all to be masked by SEG, when
applied to sequences in SWISS-PROT, so filtering should not be
expected to always yield an effect. Furthermore, in some cases,
sequences are masked in their entirety, indicating that the
statistical significance of any matches reported against the
unfiltered query sequence should be suspect."
More information about the Biopython-dev
mailing list