[Bioperl-l] Bio::*Taxonomy* changes
bix at sendu.me.uk
Tue Jul 18 17:03:54 UTC 2006
Chris Fields wrote:
>> What about the existing genus(), species(), sub_species() and variant()
>> methods? There would be no need for any logic to join things together,
>> but I would still like to be able to get just 'sapiens' from somewhere.
>> Can I use species() for that purpose (though again, species is strictly
>> 'Homo sapiens')? Likewise sub_species() and variant() could hold the
>> remaining non-redundant names. Or should all of these be deprecated
>> because they don't really have a place in a generic Node class?
> This is where Hilmar suggests that you have a bit of freedom in doing what
> you want, as with binomial(). So species() should return species
> ('sapiens'), genus return genus, etc.
[regarding changes to Bio::Taxonomy::Node]
Actually, I'm really strongly leaning toward getting rid of the
following methods and new() options (and giving up entirely on being
able to keep 'sapiens' somewhere):
As far as I can see none of these methods have any place in a generic
Node class. If you want to know what your species is you have to be
rank() 'species' and you just call scientific_name(). The above kind of
methods belong in something like Bio::Species or similar, NOT in Node.
Does anyone disagree? Can anyone offer a justification for keeping these
Changes I haven't yet discussed but have already made (but not committed):
*parent_taxon_id = \&parent_id;
*common_name = \&common_names;
-factory and factory() removed, since there is no
Bio::Taxonomy::FactoryI-implementing module, nothing in Node to make use
of a factory once set, and a factory seems redundant when we're a node
with a -dbh.
validate_name() removed because it just returns 1.
>> What about node_name()? Yet another synonym of scientific_name? (right
>> now it grabs the common name(s)). Ugh.
> I agree things need cleaning up. You could always make node_name() an alias
> for scientific_name() though it could just be deprecated.
Actually, I've gone with node_name as the 'pure' and best method to set
the name of your node with, and made scientific_name an alias of it
(though it behaves as suggested earlier in the thread).
>> What should I do with the classification array? Should it hold the raw
>> ScientificName like:
>> join(',', $node->classification) eq 'Homo sapiens, Homo,
>> Homo/Pan/Gorilla group [...]'?
(I've decided to do it the above way for consistency with scientific_name)
>> Or should it be like:
>> join(',', $node->classification) eq 'sapiens, Homo, Homo/Pan/Gorilla
>> group [...]'?
> Don't know what the dump file gives; the XML output using efetch via entrez
> has the raw lineage (as appears in a GenBank sequence file) and the actual
> full lineage with TaxID, rank, 'scientific name,' in the actual lineage
> order. I think one problem area will be the 'no rank' designations in the
> lineage. Note that the below example also has a species and no genus;
Currently, flatfile and entrez ignore nodes with a rank of 'no rank'
when they build the classification array. I had no intention of changing
> <ScientificName>Actinobacteria (class)</ScientificName>
Ugh. I guess my proposal to remove <> bits via flatfile extends to
removing () bits via entrez. We don't need unique names; we can use
object_id() when uniqueness matters.
>> I don't think binomial() would serve any useful purpose now, however.
> We could use binomial() for the 'scientific name' as the rest of the world
> knows it (as in binomial nomenclature), having it built from genus-species
> like you had originally suggested.
No, see above. I don't think it makes the slightest bit of sense for a
Node to go around trying to build things from a parent it may or may not
have. Again, binomial() is a method for something like Bio::Species, not
a generic Node class.
More information about the Bioperl-l