[BioRuby] [Wg-phyloinformatics] bioruby classes for phyloxml support

Wed May 27 03:02:20 UTC 2009

On Tue, May 26, 2009 at 3:25 AM, Raoul JP Bonnal <bonnalraoul at ingm.it>wrote:

> Christian M Zmasek ha scritto:
>
>> I agree it is better to extend existing classes, as opposed to change them
>> drastically.
>> One thing to keep in mind, is that many attributes are composed of
>> multiple fields themselves, i.e. you would need to create a class for them
>> (if such a class not already exists).
>> The most important element besides sequence, is the taxonomy class.
>>
>> Since BioRuby does not contain a general purpose taxonomy class at this
>> point, it might be worth spending some time in designing such a class.
>>
>> I propose a taxonomy class with the following elements:
>> -scientific name (e.g. Nematostella vectensis)
>> -common name (e.g. starlet sea anemone)
>> -code (or mnemonic, as used by swiss-port) (e.g. NEMVE)
>> -rank (e.g. species)
>>
>> phyloxml also has a URI for taxonomies, but I am not sure if this is
>> important for a general taxonomy class.
>>
>> On the other hand, a general taxonomy class might also have
>> - authority (e.g. Stephenson, 1935)
>> - aliases []
>> (if these elements are considered important, they of course could be added
>> to the next version of phyloxml)
>>
>> What do people think about this?
>>
> How other langs represent that class ? I think that having the chance to
> define a new class there is the opportunity to define a similar api among
> bio-languages.
> Then, taxonomy class could be used by biosequences objects
> representing/grabbing data from biosql for example.
>
> http://code.open-bio.org/svnweb/index.cgi/biosql/checkout/biosql-schema/trunk/doc/biosql-ERD.pdf
>
> --
> Ra
>
>
I looked at BioPerl and this is what I could extract and understand from
documentation (I have almost no experience with Perl)

Bio::Taxonomy:Node
  * rank (species, genus, order etc)
  * id (NCBI taxonomy id in most cases)
  * parent_id (NCBI taxonomy id in most cases)
  * genetic_code
  * mitochondrial_genetic_code
  * create_date (Date this node was created (in the database))
  * update_date (Get/Set Date this node was updated (in the database))
  * pub_date (Get/Set Date this node was published (in the database))
  * scientific_name
  * common_names []

I don't think the create_date, update_date and pub_date are necessary for
our purposes, since it is database specific information, and if you need it,
then you can go to database and get it.

I didn't found a taxonomy/taxon class in BioPython.

Here is BioJava. Its taxon class represents taxonomy information at NCBI
database. BioSQL has almost identical taxon class.

NCBITaxon Class
  * acronym
  * common name
  * equivalent name
  * genetic code
  * hidden
  * left value
  * mitogeneticcode
  * noderank
  * parent
  * rightvalue
  * scientific name
  * synonym

BioRuby has Bio::SQL::Taxon and Bio::SQL::TaxonName classes, but they are
empty.