[Bioperl-l] Bio::*Taxonomy* changes

Chris Fields cjfields at uiuc.edu
Tue Jul 18 00:24:50 EDT 2006


When you mean genus-species, which would be yes.  But parent nodes?   
If you trust WIkipedia, the scientific name == binomial  
nomenclature.  Which could mean no subspecies, strains, etc if one  
were to be really strict about it, though that may be a grey area;  
I'm no taxonomist.

http://en.wikipedia.org/wiki/Scientific_name

The parent nodes shouldn't have a scientific name if one were to  
adhere strictly to the standard definition above, but NCBI refers to  
the names for the parent nodes as 'scientific name' (the XML element  
is still ScientificName, just like the child node).  I'm not sure  
what the tax dump file is, though, so that may be different.  Here's  
the lineage for Taxid 312284 (marine actinobacterium PHSC20C1).  I  
cut out the irrelevant bits and just show the lineage with all the  
parent nodes, taxID, and rank:

       <TaxId>131567</TaxId>
       <ScientificName>cellular organisms</ScientificName>
       <Rank>no rank</Rank>

       <TaxId>2</TaxId>
       <ScientificName>Bacteria</ScientificName>
       <Rank>superkingdom</Rank>

       <TaxId>201174</TaxId>
       <ScientificName>Actinobacteria</ScientificName>
       <Rank>phylum</Rank>

       <TaxId>1760</TaxId>
       <ScientificName>Actinobacteria (class)</ScientificName>
       <Rank>class</Rank>

       <TaxId>52018</TaxId>
       <ScientificName>unclassified Actinobacteria</ScientificName>
       <Rank>no rank</Rank>

       <TaxId>78537</TaxId>
       <ScientificName>unclassified Actinobacteria (miscellaneous)</ 
ScientificName>
       <Rank>no rank</Rank>

....

Seems to me the easiest thing to do here, when looking at a  
particular node, is to use scientific_name() to hold that particular  
element for the node and have binomial represent the true 'scientific  
name', much as Sendu proposed.  It would also make life much easier  
when parsing GenBank/SwissProt/EMBL (SeqIO) to have the data  
designating the formal scientific name (according to NCBI) be  
assigned to a scientific_name() get/set method in Bio::Species for  
later writing; then if we want to delegate this over to  
Bio::Taxonomy::Node from Bio::Species it would be that much easier.

This would also get around some of the problems I have been seeing  
with bacterial names when passing GenBank data through SeqIO, since  
you wouldn't be required to glop the name together from the way  
Bio::Species tried to guess the lineage.

Chris

On Jul 17, 2006, at 9:06 PM, Hilmar Lapp wrote:

>
> On Jul 17, 2006, at 9:55 PM, Chris Fields wrote:
>
>> Leaving the scientific_name as NCBI designates it, though it probably
>> disagrees with ~99% of the world's textbooks, may be the most
>> maintainable solution.
>
> It doesn't disagree, it's quite like what the world's textbooks give
> you as a 'scientific name'.
>
> 	-hilmar
>
> -- 
> ===========================================================
> : Hilmar Lapp  -:-  Durham, NC  -:-  hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign





More information about the Bioperl-l mailing list