[Bioperl-l] taxonomy db flatfile: get taxon from gi?

Jason Stajich jason.stajich at gmail.com
Thu Nov 10 21:15:29 UTC 2011


Here's another variant of one I wrote which is for my own purposes, the code at the beginning uses a NOSQL solution to storing all the ACC -> GI
and then a second db to store GI -> TAXONID

This is the case where I have a file of accession numbers and I want to add to the description line the taxonomy string.

https://github.com/hyphaltip/mobedac-fungi/blob/master/scripts/taxonomy_lookupmissing.pl

That's the first 165 lines, and then lookups are basically what you see on line 195.

Would be good to rewrite that script below to use TokyoCabinent or KyotoCabinent (is newer implementation, not sure if it is faster?).
one thing that this does is take up a lot of disk space ,but you can have tradeoffs between than and which compression scheme you use, which will impact performance of loading.

Jason

On Nov 10, 2011, at 12:51 PM, Bernd Web wrote:

> Hi Anna,
> 
> Jason changed his example script from using hashes to using SQLite:
> bp_classify_hits_kingdom - classify BLAST hits by taxonomic kingdom
> 
> See
> https://github.com/bioperl/bioperl-live/blob/master/scripts/taxa/bp_classify_hits_kingdom.pl
> 
> It's an example script that shows how to do the tax to gi mapping for
> blast reports.
> 
> 
> Bernd
> 
> On Thu, Nov 10, 2011 at 9:01 PM, Anna Friedlander <anna.fr at gmail.com> wrote:
>> Hi all
>> 
>> Does anyone know if there is a way to get a Taxonomy node and/or
>> taxonid from a gi number using the flatfile with taxonomy db?
>> 
>> I have blast output that I want to append taxonomic information to. I
>> have hundreds of thousands of items to do this for, so it's not
>> practical to use entrez to query the NCBI database.
>> 
>> I have the GI->taxid file from the taxonomy ftp but it's 3.2GB so I
>> think much too large to put into a hash!
>> 
>> This was also discussed in 2009:
>> http://bioperl.org/pipermail/bioperl-l/2009-April/029751.html but I
>> don't think there was a conclusion?
>> 
>> Thanks for your help
>> Anna Friedlander
>> 
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>> 
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l





More information about the Bioperl-l mailing list