[Biopython-dev] [Bug 3000] New: Could SeqIO.parse() store the whole, unparsed multiline entry?
bugzilla-daemon at portal.open-bio.org
bugzilla-daemon at portal.open-bio.org
Tue Jan 26 01:44:28 UTC 2010
http://bugzilla.open-bio.org/show_bug.cgi?id=3000
Summary: Could SeqIO.parse() store the whole, unparsed multiline
entry?
Product: Biopython
Version: 1.53
Platform: PC
OS/Version: Linux
Status: NEW
Severity: normal
Priority: P2
Component: Main Distribution
AssignedTo: biopython-dev at biopython.org
ReportedBy: mmokrejs at ribosome.natur.cuni.cz
Taking into account the genbank file-format writing is not yet complete I
wonder whether you would allow to keep optionally along each parsed record it's
unparsed multi-line representation. For example, I use biopython to filter-out
certain records from a fasta/genbank file by accession, gi, tissue (well the
last haven't done yet;)). I do not change the format, I just ignore certain
entries.
I did not understand the Tutorial ("5.4.3 Getting your SeqRecord objects as
formatted strings") well but I iterate over the records and once having the
record I want to be on the safe side and to record._print_original_blob() and
get e.g.
LOCUS ....
...
//
I do not have the record_iterator so cannot use the proposed
out_handle.write(record.format("genbank")) approach. Still, I suspect this will
reformat the entry (currently I see trailing dot removed from KEYWORDS, no
REFERENCE, AUTHORS, TITLE, JOURNAL, PUBMED; and FEATURES.source being
re-ordered).
I foresee this to depend on an optional argument to SeqIO.parse() specifying
that a user wants to keep this in memory and merely that he/she understands
this is probably not much useful for large chromosomes, etc.
Similarly, I think until parsing/writing e.g. TITLE is fully available why
couldn't you just store the whole multi-line thing in some variable?
--
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.
More information about the Biopython-dev
mailing list