[Bioperl-l] getting/setting species names with Bio::Species
Mark A. Jensen
maj at fortinbras.us
Fri Jan 15 16:10:02 UTC 2010
excellent summary--thanks!!
----- Original Message -----
From: "Chris Fields" <cjfields at illinois.edu>
To: "Mark A. Jensen" <maj at fortinbras.us>
Cc: "BioPerl List" <bioperl-l at lists.open-bio.org>
Sent: Friday, January 15, 2010 11:00 AM
Subject: Re: [Bioperl-l] getting/setting species names with Bio::Species
>> FWIW, I'd prefer "binomial" = "genus" . "species"
>
>
> That's the way Bio::Species is supposed to work, at least when it was
> refactored by Sendu. But just a note: Bio::Species was considered deprecated
> (scheduled for the 1.7 release IIRC) for many very good reasons in favor of
> Bio::Taxon. First and foremost among these is the fact we cannot consistently
> parse out the genus/species/strain/variant/etc for every organism in GenBank
> w/o knowing it's full lineage, which means including some taxonomic
> information. And even then it's highly problematic.
>
> We've had several heated discussions on list about how to handle this in a
> somewhat backwards-compatible way, and the main solution was to forego
> compatibility issues altogether and eventually deprecate Bio::Species
> altogether in favor of Bio::Taxon, a class that doesn't make the same
> assumptions. Bio::Species, in the interim, is-a Bio::Taxon. You'll note that
> a minimal Bio::DB::Taxonomy instance is constructed from the classification
> scheme in some instances, but if one had a proper DB link one could link to
> Entrez Taxonomy or a local flat file indexes DB and grab the info. Bio::Taxon
> (correct me if I'm wrong on this Sendu, if you're out there) eschews various
> methods (species, etc) for simpler consistent ones based on Taxonomy, and
> doesn't force us to handle every exception to getting the genus/species out of
> a name. That is left up to the user, at their peril.
>
> For either one, if you are reproducing the fully qualified name, you probably
> should use something like node_name() for consistency. Bio::Species also has
> scientific_name(). With a true Bio::Taxon one would need to be check this is
> performed on the species node.
>
> chris
>
> On Jan 15, 2010, at 9:31 AM, Mark A. Jensen wrote:
>
>> I'm not that familiar with Bio::Species either, but this looks
>> like conflicting semantics betwen Bio::Species and Bio::SeqIO.
>> Bio::SeqIO sets the species accessor to the 'species' element of
>> the lineage array, I believe.
>> FWIW, I'd prefer "binomial" = "genus" . "species"
>> MAJ
>> ----- Original Message ----- From: "Dave Messina" <David.Messina at sbc.su.se>
>> To: "BioPerl List" <bioperl-l at lists.open-bio.org>
>> Sent: Friday, January 15, 2010 10:17 AM
>> Subject: [Bioperl-l] getting/setting species names with Bio::Species
>>
>>
>>> Hi everybody,
>>>
>>> I'm having a little trouble with names in Bio::Species objects.
>>>
>>> According to the Bio::Species documentation, if I have a species name as a
>>> string, like "Homo sapiens", I can get and set that using the species
>>> method:
>>>
>>> my $my_species_obj = Bio::Species->new();
>>> $my_species_obj->species('Homo sapiens');
>>>
>>> print $my_species_obj->species; # 'Homo sapiens'
>>>
>>>
>>> That works fine if I create the Bio::Species object myself.
>>>
>>> But if I try to get that string back out from a BIo::Species object created
>>> by SeqIO from a genbank file, I get just 'sapiens' back:
>>>
>>> my $io = Bio::SeqIO->new('-format' => 'genbank',
>>> '-file' => 'hoxa2.gb');
>>> my $seq_obj = $io->next_seq;
>>> my $io_species_obj = $seq_obj->species;
>>>
>>> print $io_species_obj->species; # 'sapiens'
>>>
>>>
>>> I think that happens because genbank records have more taxonomic info about
>>> the species name, like the genus (and in fact the whole taxonomic
>>> categorization: kingdom phylum order, etc). So the genus is stored
>>> separately.
>>>
>>> Poking around a bit more in Bio::Species, I turned up the method 'binomial',
>>> which appears to do the right thing, returning genus and species in both
>>> cases. Except, as you can see, the space is stripped out for my
>>> species-name-is-just-a-string object:
>>>
>>> print $my_species_obj->binomial; # 'Homosapiens'
>>> print $io_species_obj->binomial; # 'Homo sapiens'
>>>
>>>
>>> I'm not very familiar with Bio::Species (and its parent Bio::Taxon); am I
>>> using it correctly above, or is there a better way?
>>>
>>> If not, this kinda looks like a bug to me. I've got a patch which works and
>>> passes the BioPerl test suite.
>>>
>>>
>>> Thanks,
>>> Dave
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> Bioperl-l mailing list
>>> Bioperl-l at lists.open-bio.org
>>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>>
>> _______________________________________________
>> Bioperl-l mailing list
>> Bioperl-l at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
More information about the Bioperl-l
mailing list