[BioPython] SeqRecord to Genbank: use SeqIO?

Peter biopython at maubp.freeserve.co.uk
Tue Aug 5 21:52:36 UTC 2008


Hi Cedar,

Did you mean to send this to me personally?  I hope you don't mind me
sending this reply to the list too.

> Thank you all for your replies.
>
>>> The problem is, it doesn't support something like:
>>> SeqIO.write(seq_record, out_file_handle, "genbank")
>>> Because it requires an iterable object I guess?
>
>> Yes, you would have to use this:
>> SeqIO.write([seq_record], out_file_handle, "genbank")
>
> This suggestion makes sense, but when I try it, I get:
>
>  File "downloader.py", line 40, in <module>
>    SeqIO.write([record], out_file_handle, "genbank")
>  File "/sw/lib/python2.5/site-packages/Bio/SeqIO/__init__.py", line 238, in
> write
>    raise ValueError("Unknown format '%s'" % format)
> ValueError: Unknown format 'genbank'
>
> and here is the line 40 of code it refers to:
> SeqIO.write([record], out_file_handle, "genbank")
>
> I'm running 1.47 installed via fink.

Right - because in Biopython 1.47, Bio.SeqIO don't support GenBank
output (as I had tried to make clear).  Earlier this week I committed
very preliminary support for writing GenBank files with Bio.SeqIO to
CVS.  Please add yourself as a CC on Bug 2294 if you want to be kept
apprised of this.
http://bugzilla.open-bio.org/show_bug.cgi?id=2294

Would it help if the error message for this situation was a little
more precise?  e.g. Rather than "Unknown format 'xxx'", perhaps
"Writing 'xxx' format is not supported yet, only reading it".

>> Could I ask why you want to get the SeqRecord as a string in GenBank
>> format?
>
> Thanks for the tip for how to get a string. I want to be able to present a
> genbank file inline in a webpage. Also during trouble shooting, I was trying
> to read a genbank file in, then print it to the console, just to make sure
> things were working.

OK - wanting a SeqRecord as a string for embedding in a webpage this
makes perfect sense.  For debugging, "print record" should give you a
human readable output (but it isn't in any particular format).

You have explicitly asked about SeqRecord to GenBank, but as an aside,
the Tutorial does (briefly) talk about using Bio.GenBank to get a
"genbank record" rather than a SeqRecord object.  This is a simple and
direct representation of the raw GenBank fields, and it should be
possible to use this to almost recreate the GenBank file.
>>> from Bio import GenBank
>>> gb_iterator = GenBank.Iterator(open("cor6_6.gb"), GenBank.RecordParser())
>>> for cur_record in gb_iterator : print cur_record

This won't be 100% the same as the input file, but it is close.

> I'm probably way out of line here, because frankly, I'm not the best python
> coder, and I haven't contributed a thing to biopython, but here it is
> anyway:
>
> I don't understand why SeqIO must write to a handle anyway. I think
> something like:
>
> file_handle.write(SeqIO.to_string([record], "genbank"))
>
> is just as easy as the existing method, and has the advantage of giving us
> the option of just getting a string like:
>
> genbank_string = SeqIO.to_string([record], "genbank")

When we first discussed the proposed SeqIO interface, handles were
seen as a sensible common abstraction.  The desire to get a string was
discussed but (as I recall) was not considered to be as common as
wanting to write to a file.  In fact web-server applications are still
the only example I can think of right now, and the StringIO solution
or the "to string method" discussed below cover this.

> And while I'm at it, I think even easier would be:
>
> file_handle.write(record.to_format("genbank"))
> and
> genbank_string = record.to_format("genbank")
>
> would be even easier.

If you have any preference on the precise function name, please add a
comment on Bug 2561.
http://bugzilla.open-bio.org/show_bug.cgi?id=2561

> In any case, biopython make my life much easier, and I appreciate it!
> best,
> Cedar

Great :)

Peter



More information about the Biopython mailing list