[Biopython-dev] [Bug 3000] Could SeqIO.parse() store the whole, unparsed multiline entry?

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Tue Jan 26 13:15:38 UTC 2010


http://bugzilla.open-bio.org/show_bug.cgi?id=3000





------- Comment #1 from biopython-bugzilla at maubp.freeserve.co.uk  2010-01-26 08:15 EST -------
(In reply to comment #0)
> Taking into account the genbank file-format writing is not yet complete I
> wonder whether you would allow to keep optionally along each parsed record
> it's unparsed multi-line representation.

You can probably do it already with the old Bio.GenBank iterator object
(I think you use no parser object to get the raw text).

Adding this to Bio.SeqIO doesn't seem a wonderful idea. The whole approach
only makes sense for sequential file formats with no header (like FASTA,
GenBank, EMBL, SwissProt) but not interlaced files (most alignments) or
those with headers or XML formats. It also breaks completely the moment
the user makes any modification to the SeqRecord object - and handling
that cleanly would be tricky.

> Still, I suspect this will
> reformat the entry (currently I see trailing dot removed from KEYWORDS, no
> REFERENCE, AUTHORS, TITLE, JOURNAL, PUBMED; and FEATURES.source being
> re-ordered).

Yes, using Bio.SeqIO to read/write a GenBank record will give you (slightly)
different output. We do not guarantee a 100% round trip (even on simpler
formats like FASTA). Even little things like line wrapping would make this
very difficult.

Regarding GenBank KEYWORDS, please file a bug.

Regarding GenBank reference lines (REFERENCE, AUTHORS, TITLE, JOURNAL, PUBMED)
this is still covered by existing Bug 2294

Regarding GenBank source feature, please file a bug.

> Similarly, I think until parsing/writing e.g. TITLE is fully available why
> couldn't you just store the whole multi-line thing in some variable?

The remaining unsupported bits of the ID line are covered byg existing
Bug 2294 and Bug 2578.

Regarding the reference lines (REFERENCE, AUTHORS, TITLE, JOURNAL, PUBMED)
this is still covered by existing Bug 2294.

Peter


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list