[Biopython-dev] [Bug 2477] SeqIO.parse does not handle embl files
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Thu Mar 27 11:37:16 UTC 2008
http://bugzilla.open-bio.org/show_bug.cgi?id=2477
------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk 2008-03-27 07:37 EST -------
As you said, this is a multi-part bug!
To try this out, you will need to update files Bio/GenBank/Scanner.py and
__init__.py which are now in CVS. If you are not familiar with CVS, the easier
method would be to download the two files from here:
http://cvs.biopython.org/cgi-bin/viewcvs/viewcvs.cgi/biopython/Bio/GenBank/?cvsroot=biopython#dirlist
Note there is an hour or so time delay before it will show my changes. You can
see where the files should be put from the stack trace.
Please let me know how you get on (by posting on this bug).
Missing AC lines
================
All our EMBL test cases tested included an AC line, and Biopython 1.45 was
failing because of the missing AC line in your example, which was used to set
the SeqRecord's id property. I have updated CVS to fall back on the ID line.
Multiple DE lines
=================
Already fixed as of Biopython 1.44
Multiple OC lines
=================
Updated Biopython CVS to cope with multi-line taxonomy lineage
PA lines (parent accessions)
============================
You didn't report this, but we currently are ignoring the PA lines.
Quoting ftp://ftp.ebi.ac.uk/pub/databases/embl/cds/README.txt
PA line - contains the accession.version of the "parent" EMBL entry
(entry where the CDS is annotated)
e.g. a whole contig, not just this one CDS/gene. We could record this in the
SeqRecord's annotations dictionary as a list of strings under key
'parent-accessions'. What do you think?
Peter
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list