[Bioperl-l] Bio::*Taxonomy* changes
Chris Fields
cjfields at uiuc.edu
Tue Jul 18 00:24:50 EDT 2006
When you mean genus-species, which would be yes. But parent nodes?
If you trust WIkipedia, the scientific name == binomial
nomenclature. Which could mean no subspecies, strains, etc if one
were to be really strict about it, though that may be a grey area;
I'm no taxonomist.
http://en.wikipedia.org/wiki/Scientific_name
The parent nodes shouldn't have a scientific name if one were to
adhere strictly to the standard definition above, but NCBI refers to
the names for the parent nodes as 'scientific name' (the XML element
is still ScientificName, just like the child node). I'm not sure
what the tax dump file is, though, so that may be different. Here's
the lineage for Taxid 312284 (marine actinobacterium PHSC20C1). I
cut out the irrelevant bits and just show the lineage with all the
parent nodes, taxID, and rank:
<TaxId>131567</TaxId>
<ScientificName>cellular organisms</ScientificName>
<Rank>no rank</Rank>
<TaxId>2</TaxId>
<ScientificName>Bacteria</ScientificName>
<Rank>superkingdom</Rank>
<TaxId>201174</TaxId>
<ScientificName>Actinobacteria</ScientificName>
<Rank>phylum</Rank>
<TaxId>1760</TaxId>
<ScientificName>Actinobacteria (class)</ScientificName>
<Rank>class</Rank>
<TaxId>52018</TaxId>
<ScientificName>unclassified Actinobacteria</ScientificName>
<Rank>no rank</Rank>
<TaxId>78537</TaxId>
<ScientificName>unclassified Actinobacteria (miscellaneous)</
ScientificName>
<Rank>no rank</Rank>
....
Seems to me the easiest thing to do here, when looking at a
particular node, is to use scientific_name() to hold that particular
element for the node and have binomial represent the true 'scientific
name', much as Sendu proposed. It would also make life much easier
when parsing GenBank/SwissProt/EMBL (SeqIO) to have the data
designating the formal scientific name (according to NCBI) be
assigned to a scientific_name() get/set method in Bio::Species for
later writing; then if we want to delegate this over to
Bio::Taxonomy::Node from Bio::Species it would be that much easier.
This would also get around some of the problems I have been seeing
with bacterial names when passing GenBank data through SeqIO, since
you wouldn't be required to glop the name together from the way
Bio::Species tried to guess the lineage.
Chris
On Jul 17, 2006, at 9:06 PM, Hilmar Lapp wrote:
>
> On Jul 17, 2006, at 9:55 PM, Chris Fields wrote:
>
>> Leaving the scientific_name as NCBI designates it, though it probably
>> disagrees with ~99% of the world's textbooks, may be the most
>> maintainable solution.
>
> It doesn't disagree, it's quite like what the world's textbooks give
> you as a 'scientific name'.
>
> -hilmar
>
> --
> ===========================================================
> : Hilmar Lapp -:- Durham, NC -:- hlapp at gmx dot net :
> ===========================================================
>
>
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign
More information about the Bioperl-l
mailing list