[Bioperl-l] Re: Bio::Species changes

Ewan Birney birney@ebi.ac.uk
Mon, 7 Oct 2002 07:53:20 +0100 (BST)


On Sun, 6 Oct 2002, Hilmar Lapp wrote:

> I committed a number of changes. These are the changes concerning
> Bio::Species:
>
> - there is now a second way of calling classification:
> 	$species->classification(\@classif_array, "FORCE");
> In set mode, the first parameter can now be an array reference. If
> it is, and a second parameter is present and evaluates to TRUE, no
> name validation whatsoever will be done.
>
> - new method variant() (get/set)
> This will hold the (potentially literal) information regarding the
> variant of the species, like a strain, isolate, variety, etc. I
> modified the swissprot parser to extract this information properly.
> By potentially literal I mean that e.g. swissprot gives this in a
> form like '(strain A/Equine/Fontainebleau/76)' and except for the
> enclosing parentheses the parser will not change this string (i.e.,
> 'strain' remains there).
>
> I'm now at 65k sequences of swissprot rel. 40 into biosql, and
> there's still some fall-out, for which I'll commit fixes soon. Other
> than that, the results look pretty good meanwhile.
>


This is great Hilmar - getting Bioperl to sanely parse all of
swissprot/embl/genbank and represent it sensibly is going to be a big, big
win for 1.2


You mentioned in your commit that you have gone for nested annotations -
is this needed for swissprot parsing? I get a little concerned about
nested things as I feel it often allows people to come up with complex,
hard to convey semantics about the nesting.