[Bioperl-l] Species in a BLAST hit

Jason Stajich jason at cgt.duhs.duke.edu
Sun May 9 10:27:39 EDT 2004


Well it's truncated in the BLAST report presumably - Hit->description only
reports what is in the file so you'll have to get that information.  The
simplest is if you have the db which is being blasted you can just use
Bio::Index::Fasta, Bio::DB::Fasta, or Bio::DB::Flat to retrieve that
sequence and get the full description back out from there
($seq->description).  This will be quite fast if you use any of the
indexing systems.

Another way to do it, which may be overkill for your needs but is still
useful esp if you want to start filtering by different taxonomic
groupings.

If you have a gi number you can lookup the taxa id for it using the
gi_taxid_nucl.dmp.gz in ftp://ftp.ncbi.nih.gov/pub/taxonomy

The use Bio::DB::Taxonomy - eiter use the Bio::DB::Taxonomy::flatfile to
index it locally or if you just want species name you can do the lookup
via Entrez with Bio::DB::Taxonomy::entrez

===
We don't keyword our mails so I don't know that keyword searches work.
Many people have found this to work:
google.com site:bioperl.org +pipermail "query"

We're discussing maybe having the mailing lists available through a news
server so you can download and index it all locally although there are
quite a few messages there I don't know that is what people want to do
either...

Still waiting for that donated google appliance to OBF =)

-jason
On Sun, 9 May 2004, Jonathan Manning wrote:

> Hi all,
>
> Firstly apologies- I am aware this has been discussed in the archives,
> but couldn't find the resolution, so was wondering if there are any
> ideas (also any pointers on how to search the archives by keyword would
> be useful.......!!!).
>
> The problem is that the [organism] information seems to be truncated
> from the description fields in Bio::Search::Hit::BlastHit objects, and
> I'd like to be able to get the information. I've parsed the relevant
> part of the 'name' string, but some of these are fairly non-sensical
> (e.g. URSMA)- although if anyone has a list of what they mean that would
> solve my problem as well.
>
> I'm aware that I could extract the information by retrieving the
> relevant GenBank file, but this would slow things down significantly.
>
> Thanks in advance,
>
> Jon
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at portal.open-bio.org
> http://portal.open-bio.org/mailman/listinfo/bioperl-l
>

--
Jason Stajich
Duke University
jason at cgt.mc.duke.edu


More information about the Bioperl-l mailing list