[Biopython] Blast query

Peter Cock p.j.a.cock at googlemail.com
Mon Nov 17 15:45:09 UTC 2014


On Fri, Nov 14, 2014 at 10:35 AM, Aisling O'Driscoll
<Aisling.ODriscoll at cit.ie> wrote:
> Hi Peter,
>
>
>
> Thanks for the reply. The confusing thing is that Bio.Blast.NCBIWWW.qblast()
> does return the same result as the NCBI BLAST website. So my query is more a
> blast one rather than a biopython one but I couldn’t think of a better place
> to ask this query.
>
>
>
> My fasta file contains a gene called Listeria monocytogenes EGD-e chromosome
> with gi|16802048:2172068-2173591.
>
> Just to verify that this is what it says it is and for my own curiosity, I
> uploaded the entire fasta file to the NCBI BLAST website and, as expected,
> it returns that this is indeed Listeria monocytogenes EGD-e chromosome with
> gi|16802048:2172068-2173591.
>
> However if I open the fasta file, copy the sequence only out of the file and
> paste into NCBI BLAST, I get a result of Listeria monocytogenes WSLC1001
> complete genome with gi|584465821|gb|CP007160 (same as my biopython result).
> I’m trying to understand why the correct gene isn’t identified using just
> the sequence.

Using megablast on the NCBI website against the NT database, entering
just the sequence as the query I get numerous 100% hits including:

Listeria monocytogenes WSLC1001, complete genome
Listeria monocytogenes EGD, complete genome
Listeria monocytogenes strain SLCC2479, serotype 3c
Listeria monocytogenes strain SLCC2372, serotype 1/2c
...

Given they are all 1524bp identical hits, the order is arbitrary.

I didn't notice a difference in order in the output when I used the FASTA
entry as the query.

Peter



More information about the Biopython mailing list