[Biopython-dev] EMBL flatfile parsing
Peter
biopython-dev at maubp.freeserve.co.uk
Mon Feb 19 11:24:29 UTC 2007
Peter wrote:
>>> Does this sound like a sensible way to include EMBL support?
>>>
>>> ...
>
> This took longer than I expected, but its done now.
Has anyone had a chance to try out the revised EMBL/GenBank parser?
I could ask on the main list, but as testing the EMBL parsing would
require installing the CVS release (or updating just Bio/GenBank and
Bio/SeqIO by hand) that seems a bit much to ask.
There are three main things I would like feedback on:
(a) Has any existing code using Bio.GenBank been affected at all.
(b) Does Bio.SeqIO read your favourite EMBL/GenBank files.
(c) How parsing the file as "genbank-cds" and "embl-cds" look?
i.e. This returns each CDS feature with its stated amino acid
translation as a SeqRecord. Does anyone else think getting that the
genes themselves in this way is a useful option? I'm not sure about the
simplistic code to choose the SeqRecord id/name/description - this is
difficult as there is a lot of variation in annotation conventions.
> I should probably add some EMBL examples to the SeqIO unit test...
I have added a single record EMBL file to the test suite.
Peter
More information about the Biopython-dev
mailing list