Bioperl: Bug in EMBL parser: expects FH/FT lines

Peter van Heusden pvh@egenetics.com
Thu, 15 Jun 2000 10:10:02 +0200 (SAST)


Hi All

There is a bug in the way the Bio::SeqIO::embl module parses EMBL, and the
way it writes EMBL entries.

As I read the EMBL manual, 3.3 Structure of an Entry
(http://www.ebi.ac.uk/embl/Documentation/User_manual/structure_entry.html)
there is a defined order of particular lines, but there is no requirement
that all types of lines are present. In particular, there may be 
0 or more FH/FT lines. 

Unfortunately, the embl.pm next_seq() function explicitely expects ID,
then various optional things, then FH, then FT, then SQ. This means that
an EMBL entry without FT lines is not only ignored, but causes
next_seq() to read to the end of the file, discarding all lines along the
way. A rather major bug, in my opinion.

Secondly, and the reason I discovered this, the embl.pm
write_seq() function always writes the FH lines, and only writes FT lines
if there are features present. Surely it should check to see if there are
features before writing FH lines?

In some of our work here, we converted a set of Fasta entries to EMBL,
using Bio::SeqIO, and then (some time later) tried to convert the
resulting EMBL entries to Fasta, at which point the above behaviour was 
discovered.

If everyone agrees with my understanding of how things should work, I
can submit patches...

Peter
--
Peter van Heusden				pvh@egenetics.com
Electric Genetics

=========== Bioperl Project Mailing List Message Footer =======
Project URL: http://bio.perl.org/
For info about how to (un)subscribe, where messages are archived, etc:
http://www.techfak.uni-bielefeld.de/bcd/Perl/Bio/vsns-bcd-perl.html
====================================================================