[Bioperl-l] Bio::Taxonomy changes

Tue Jul 18 17:03:54 UTC 2006

Chris Fields wrote:
>> What about the existing genus(), species(), sub_species() and variant()
>> methods? There would be no need for any logic to join things together,
>> but I would still like to be able to get just 'sapiens' from somewhere.
>> Can I use species() for that purpose (though again, species is strictly
>> 'Homo sapiens')? Likewise sub_species() and variant() could hold the
>> remaining non-redundant names. Or should all of these be deprecated
>> because they don't really have a place in a generic Node class?
> 
> This is where Hilmar suggests that you have a bit of freedom in doing what
> you want, as with binomial().  So species() should return species
> ('sapiens'), genus return genus, etc.  

[regarding changes to Bio::Taxonomy::Node]

Actually, I'm really strongly leaning toward getting rid of the 
following methods and new() options (and giving up entirely on being 
able to keep 'sapiens' somewhere):

-organelle, organelle()
-division, division()
-sub_species, sub_species()
-variant, variant()
species(), validate_species_name()
genus()
binomial()

As far as I can see none of these methods have any place in a generic 
Node class. If you want to know what your species is you have to be 
rank() 'species' and you just call scientific_name(). The above kind of 
methods belong in something like Bio::Species or similar, NOT in Node. 
Does anyone disagree? Can anyone offer a justification for keeping these 
methods?

Changes I haven't yet discussed but have already made (but not committed):

*parent_taxon_id = \&parent_id;
*common_name = \&common_names;
-factory and factory() removed, since there is no 
Bio::Taxonomy::FactoryI-implementing module, nothing in Node to make use 
of a factory once set, and a factory seems redundant when we're a node 
with a -dbh.
validate_name() removed because it just returns 1.

>> What about node_name()? Yet another synonym of scientific_name? (right
>> now it grabs the common name(s)). Ugh.
> 
> I agree things need cleaning up.  You could always make node_name() an alias
> for scientific_name() though it could just be deprecated.

Actually, I've gone with node_name as the 'pure' and best method to set 
the name of your node with, and made scientific_name an alias of it 
(though it behaves as suggested earlier in the thread).

>> What should I do with the classification array? Should it hold the raw
>> ScientificName like:
>> join(',', $node->classification) eq 'Homo sapiens, Homo,
>> Homo/Pan/Gorilla group [...]'?

(I've decided to do it the above way for consistency with scientific_name)

>> Or should it be like:
>> join(',', $node->classification) eq 'sapiens, Homo, Homo/Pan/Gorilla
>> group [...]'?
> 
> Don't know what the dump file gives; the XML output using efetch via entrez
> has the raw lineage (as appears in a GenBank sequence file) and the actual
> full lineage with TaxID, rank, 'scientific name,' in the actual lineage
> order.  I think one problem area will be the 'no rank' designations in the
> lineage.  Note that the below example also has a species and no genus;
> tricky!

Currently, flatfile and entrez ignore nodes with a rank of 'no rank' 
when they build the classification array. I had no intention of changing 
this behaviour.

>       <TaxId>1760</TaxId>
>       <ScientificName>Actinobacteria (class)</ScientificName>
>       <Rank>class</Rank>

Ugh. I guess my proposal to remove <> bits via flatfile extends to 
removing () bits via entrez. We don't need unique names; we can use 
object_id() when uniqueness matters.

>> I don't think binomial() would serve any useful purpose now, however.
>
> We could use binomial() for the 'scientific name' as the rest of the world
> knows it (as in binomial nomenclature), having it built from genus-species
> like you had originally suggested.

No, see above. I don't think it makes the slightest bit of sense for a 
Node to go around trying to build things from a parent it may or may not 
have. Again, binomial() is a method for something like Bio::Species, not 
a generic Node class.

[Bioperl-l] Bio::*Taxonomy* changes

[Bioperl-l] Bio::Taxonomy changes