[Bioperl-l] Bio::*Taxonomy* changes

Chris Fields cjfields at uiuc.edu
Mon Jul 24 21:53:46 UTC 2006


> > I'll repeat:  a Node and a Species is-not-a Taxonomy.
> 
> I'll repeat: A Node is a Node and a Bio::Species is a Taxonomy ;)

Nope.  I think this is incorrect.  Here's why.

Let's look at the reasons Bio::Taxonomy was started, shall we?

>From perldoc Bio::Taxonomy:

DESCRIPTION
    Bio::Taxonomy object represents any rank-level in taxonomy system,
    rather than Bio::Species which is able to represent only species-level. 
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^

>From perldoc Bio::Taxonomy::Node

DESCRIPTION
    This is the next generation (for Bioperl) of representing Taxonomy
    information. Previously all information was managed by a single object
    called Bio::Species. This new implementation allows representation of
    the intermediate nodes not just the species nodes and can relate their
    ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
    connections.

Bioperl wiki:

http://www.bioperl.org/wiki/Project_priority_list#Taxonomy_.2F_Species_data
http://www.bioperl.org/wiki/Module:Bio::Species

Both talk about delegating or replacing Bio::Species with
Bio::Taxonomy::Node.

Everyone of those indicates what the original idea for Bio::Taxonomy::Node
was (eventual replacement for Bio::Species).  Even the original methods for
Bio::Taxonomy::Node are the same.  So, according to this alone, Bio::Species
would eventually be replaced by Bio::Taxonomy::Node.

I wanted an easier transition to Node from Bio::Species (hell, just a few
changes and using Bio::Taxonomy::Node worked fine!) , but your proposals
made sense.  I saw having a Species-based Tax object as a nice compromise,
but Hilmar has made a few good points: would we have a Bio::Species object
around knowing what we know now?  When Bio::Species was originally designed,
it was probably before the NCBI Tax database existed.  I think it has
outlasted its current use.

I have posted a response to Hilmar.  I think we should just get rid of
Bio::Species altogether and have a Taxonomy::Node contain the basic data
(scientific_name(), common_names(), etc).  And remove any SeqIO parsing of
genus/species to simplify everything.  All this extra parsing and
hand-wringing over trying to get species/genus information from a GenBank
file just mucks up ORGANISM and SOURCE line parsing anyway.  Simplify it.
Simple is good.

Radical?  Yes, but I agree with him that Bio::Species has outlasted it's
use.  As for organelle and lineage information, they could be placed in
SimpleValue objects.  If anyone wants to grab tax information, they can use
the Node object to get it but they'll need a local flatfile database or
network connection to do so.  This also means there is no need for a
Bio::DB::Taxonomy factory: just return Node objects directly.  Each format
(flatfile and entrez) currently works this way anyway, correct?  Simplifies
that.  Simple is better.

Of course, we couldn't get rid of Bio::Species until all the following were
shifted over to Node somehow:  ; >

Instances: 2    BP Module : Bio::Cluster::SequenceFamily
Instances: 4    BP Module : Bio::Cluster::UniGene
Instances: 1    BP Module : Bio::Cluster::UniGeneI
Instances: 1    BP Module : Bio::DB::FileCache
Instances: 3    BP Module : Bio::DB::GFF::Segment
Instances: 1    BP Module : Bio::DB::Taxonomy::flatfile
Instances: 2    BP Module : Bio::Graph::IO::psi_xml
Instances: 1    BP Module : Bio::Map::CytoMap
Instances: 1    BP Module : Bio::Map::LinkageMap
Instances: 3    BP Module : Bio::Map::MapI
Instances: 3    BP Module : Bio::Map::SimpleMap
Instances: 3    BP Module : Bio::Matrix::PSM::InstanceSite
Instances: 6    BP Module : Bio::Phenotype::Correlate
Instances: 1    BP Module : Bio::Phenotype::OMIM::OMIMentry
Instances: 3    BP Module : Bio::Phenotype::OMIM::OMIMparser
Instances: 5    BP Module : Bio::Phenotype::Phenotype
Instances: 2    BP Module : Bio::Phenotype::PhenotypeI
Instances: 4    BP Module : Bio::Seq
Instances: 3    BP Module : Bio::SeqI
Instances: 2    BP Module : Bio::SeqIO::agave
Instances: 4    BP Module : Bio::SeqIO::bsml
Instances: 2    BP Module : Bio::SeqIO::bsml_sax
Instances: 1    BP Module : Bio::SeqIO::chadoxml
Instances: 1    BP Module : Bio::SeqIO::chaos
Instances: 4    BP Module : Bio::SeqIO::embl
Instances: 2    BP Module : Bio::SeqIO::entrezgene
Instances: 3    BP Module : Bio::SeqIO::game::seqHandler
Instances: 4    BP Module : Bio::SeqIO::genbank
Instances: 2    BP Module : Bio::SeqIO::kegg
Instances: 2    BP Module : Bio::SeqIO::locuslink
Instances: 4    BP Module : Bio::SeqIO::swiss
Instances: 2    BP Module : Bio::SeqIO::table
Instances: 2    BP Module : Bio::SeqIO::tigr
Instances: 2    BP Module : Bio::SeqIO::tigrxml
Instances: 7    BP Module : Bio::SeqIO::tinyseq
Instances: 4    BP Module : Bio::Taxonomy
Instances: 1    BP Module : Bio::Taxonomy::Node
Instances: 6    BP Module : Bio::Taxonomy::Taxon
Instances: 9    BP Module : Bio::Taxonomy::Tree
Instances: 5    BP Module : Bio::Tools::Analysis::Protein::ELM

Chris





More information about the Bioperl-l mailing list