Bioperl: Bio::Species.pm: fixed bug #226

Ewan Birney birney@ebi.ac.uk
Sun, 7 May 2000 14:25:11 +0100 (BST)


On Sat, 6 May 2000, Hilmar Lapp wrote:

> Hello all,
> 
> as encouraged by Jason, Ewan, and James, I've corrected the bug in Species.pm
> mentioned in report #226, so far only on branch-06.
> 
> The fix involved a couple of changes, most noticeably a change of what
> Bio::Species->classification() is expecting now when it is passed an argument.
> The reason is that now all methods consistently access the same array, except
> for common_name(). Bio::SeqIO::genbank.pm and Bio::SeqIO::embl.pm already
> behaved as if Species.pm worked the way it does now, which triggered the bug.

Hmmm. I am worried that for species that do not have a subspecies
identifier that we will falsely classify the subspecies as a species.

Can you reassure me that this wont happen?

> 
> See the doc quote at the end.
> 
> I'm wondering whether there's a point in fixing the main branch as well
> immediately. A diff of Species.pm to the main-branch version showed a few
> other, but minor, differences, which I think were not introduced by me. So it
> seems to me that the two branches are not necessarily kept sync'ed even if the
> changes do not introduce API changes or new features etc. Could someone let me
> know whether I shall fix the main branch as well.
> 


<wince> I left the main turnk somewhat out of sync with the branch when I 
made the 0.6 release. (dirty secret. Hands slapped all around). Can you
propagate teh changes you think are required across (?).

> Regarding design, I have the impression that Bio::Species rather encapsulates
> the source of a sequence (therefore the organelle() method) than a species
> only, and that's exactly the point of interest with respect to seqs (like
> organ, tissue, library etc). So, intuitively I'd call it something like
> Bio::SeqSource or Bio::IsolationSource. Does anyone see a point in this? I
> have no idea how much attention the list community pays to such issues, so
> maybe someone can give me an idea whether this is just stupid and you leave
> such things to the OO/Corba/Java domain.
> 

I think the species object should handle biologically species *only*. 
issues like organelle are half way split between species issues and source
issues - I could argue it either way. In my view, a good design is
something like


  Bio::IsolationSource has-a Bio::Species object

and also has additional methods for tissue etc.


Beware: EMBL/GenBank format spreads this information out across a number
of different places, including feature table (wait for it - Chimeric
clones make life **very** difficult). We should keep our heads and design
what we feel are good objects, making sure that a large percentage of
EMBL/GenBank will fit nicely. 


I refuse to let the bioperl object model be dominated by EMBL/GenBank
parsing issues. It is an evolutionary dead-end in my view.



> In addition, Jason advised me to post these things (bug-fixes, discussion
> stuff) here, but maybe I've misunderstood him and I'm too technical for this
> list. If so, please let me know, I'm new to all this.
> 

This is bang on track. Keep on posting on this...


> Cheers,
> 
> 	Hilmar
> 
> Quoted from the updated documention:
> 
> =head2 classification
> 
>  Title   : classification
>  Usage   : $self->classification(@class_array);
>            @classification = $self->classification();
>  Function: Fills or returns the classifcation list in
>            the object.  The array provided must be in
>            the order SUBSPECIES, SPECIES, GENUS ---> KINGDOM.
>            The first and second element of the array, the subspecies and
>            species, must be in lower case, and the rest in title
>            case.  Only species must be present.
> 
>            Note that the format convention given above has changed after 
>            release 0.60. Formerly, SUBSPECIES was not necessary. In order to
>            break as few scripts as possible, the method tries to recognize
>            whether or not the subspecies is provided, given that the rest
>            is given in correct case. This is the reason that the example given
>            below is still valid.
>  Example : $obj->classification(qw( sapiens Homo Hominidae
>            Catarrhini Primates Eutheria Mammalia Vertebrata
>            Chordata Metazoa Eukaryota));
>  Returns : Classification array
>  Args    : Classification array
> 
> =cut
> 
> -- 
> -----------------------------------------------------------------------
> Hilmar Lapp                                      email: hlapp@gmx.net
> NFI Vienna, IFD/Bioinformatics                   phone: +43 1 86634 631
> A-1235 Vienna                                      fax: +43 1 86634 727
> ROI: Bioinformatics (arrays, expression, seqs), Programming, Databases,
>      Mountain Biking (hard tail, hard fork: feel the trail)
> -----------------------------------------------------------------------
> =========== Bioperl Project Mailing List Message Footer =======
> Project URL: http://bio.perl.org/
> For info about how to (un)subscribe, where messages are archived, etc:
> http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
> ====================================================================
> 

-----------------------------------------------------------------
Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
<birney@ebi.ac.uk>. 
-----------------------------------------------------------------

=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================