[Biopython-dev] EMBL flatfile parsing
Peter
biopython-dev at maubp.freeserve.co.uk
Wed Feb 7 17:04:04 UTC 2007
Michiel Jan Laurens de Hoon wrote:
> Peter wrote:
>> Does this sound like a sensible way to include EMBL support?
>>
>> ...
>
> Either way is fine with me. We can do the Bronx release in the near
> future, and do another release when the EMBL stuff is done. But it's up
> to you.
This took longer than I expected, but its done now.
There is a new file Bio/GenBank/Scanner.py which includes a base "INSDC
scanner" which handles the common code (e.g. feature tables) with two
subclasses, a GenBankScanner and an EmblScanner.
I have updated Bio/GenBank/__init_.py to remove my old Genbank only
scanner, and call the new GenBankScanner instead.
I have also updated Bio.SeqIO to use this new code for both GenBank and
EMBL formats.
http://www.biopython.org/wiki/SeqIO
Note: The handling of newlines and white spaces has changed slightly as
a result of these changes. I updated the expected output for the
test_GenBank unit test Incidentally I think this "fixes" Bug 1981:
http://bugzilla.open-bio.org/show_bug.cgi?id=1981
Other than that, touch wood, nothing should have changed for GenBank
users. The relevant unit tests look fine.
The EMBL support has a few bits that need polishing (search for TODO in
Bio/GenBank/Scanner.py for points that I noted at the time), and some
rigorous testing of course.
I should probably add some EMBL examples to the SeqIO unit test...
Peter
More information about the Biopython-dev
mailing list