[Biopython-dev] EMBL flatfile parsing

Peter biopython-dev at maubp.freeserve.co.uk
Wed Feb 7 17:04:04 UTC 2007


Michiel Jan Laurens de Hoon wrote:
> Peter wrote:
>> Does this sound like a sensible way to include EMBL support?
>>
>> ...
> 
> Either way is fine with me. We can do the Bronx release in the near 
> future, and do another release when the EMBL stuff is done. But it's up 
> to you.

This took longer than I expected, but its done now.

There is a new file Bio/GenBank/Scanner.py which includes a base "INSDC 
scanner" which handles the common code (e.g. feature tables) with two 
subclasses, a GenBankScanner and an EmblScanner.

I have updated Bio/GenBank/__init_.py to remove my old Genbank only 
scanner, and call the new GenBankScanner instead.

I have also updated Bio.SeqIO to use this new code for both GenBank and 
EMBL formats.

http://www.biopython.org/wiki/SeqIO

Note: The handling of newlines and white spaces has changed slightly as 
a result of these changes.  I updated the expected output for the 
test_GenBank unit test  Incidentally I think this "fixes" Bug 1981:

http://bugzilla.open-bio.org/show_bug.cgi?id=1981

Other than that, touch wood, nothing should have changed for GenBank 
users.  The relevant unit tests look fine.

The EMBL support has a few bits that need polishing (search for TODO in 
Bio/GenBank/Scanner.py for points that I noted at the time), and some 
rigorous testing of course.

I should probably add some EMBL examples to the SeqIO unit test...

Peter




More information about the Biopython-dev mailing list