[Biopython-dev] [Bug 3000] New: Could SeqIO.parse() store the whole, unparsed multiline entry?

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Tue Jan 26 01:44:28 UTC 2010


http://bugzilla.open-bio.org/show_bug.cgi?id=3000

           Summary: Could SeqIO.parse() store the whole, unparsed multiline
                    entry?
           Product: Biopython
           Version: 1.53
          Platform: PC
        OS/Version: Linux
            Status: NEW
          Severity: normal
          Priority: P2
         Component: Main Distribution
        AssignedTo: biopython-dev at biopython.org
        ReportedBy: mmokrejs at ribosome.natur.cuni.cz


Taking into account the genbank file-format writing is not yet complete I
wonder whether you would allow to keep optionally along each parsed record it's
unparsed multi-line representation. For example, I use biopython to filter-out
certain records from a fasta/genbank file by accession, gi, tissue (well the
last haven't done yet;)). I do not change the format, I just ignore certain
entries.

I did not understand the Tutorial ("5.4.3  Getting your SeqRecord objects as
formatted strings") well but I iterate over the records and once having the
record I want to be on the safe side and to record._print_original_blob() and
get e.g.

LOCUS ....
...
//

I do not have the record_iterator so cannot use the proposed
out_handle.write(record.format("genbank")) approach. Still, I suspect this will
reformat the entry (currently I see trailing dot removed from KEYWORDS, no
REFERENCE, AUTHORS, TITLE, JOURNAL, PUBMED; and FEATURES.source being
re-ordered).

I foresee this to depend on an optional argument to SeqIO.parse() specifying
that a user wants to keep this in memory and merely that he/she understands
this is probably not much useful for large chromosomes, etc.

Similarly, I think until parsing/writing e.g. TITLE is fully available why
couldn't you just store the whole multi-line thing in some variable?


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.



More information about the Biopython-dev mailing list