[BioPython] SeqRecord to Genbank: use SeqIO?

Peter Cock p.j.a.cock at googlemail.com
Fri Aug 1 09:20:01 UTC 2008


On Fri, Aug 1, 2008 at 2:45 AM, Cedar McKay <cmckay at u.washington.edu> wrote:
> Hello, I have SeqRecord objects that I'd like to convert to a string that is
> in Genbank format. That way I can do whatever with it, including write it to
> a file. The only way I can see to do anything similar is using SeqIO and
> doing something like:
>
> SeqIO.write(my_records, out_file_handle, "genbank")
> which I found here:
> http://biopython.org/DIST/docs/tutorial/Tutorial.html#chapter:Bio.SeqIO

That could would work fine - once Bio.SeqIO supports output in the
GenBank format.  Its been on my "to do list" for a while, but being
annotation rich this is non-trivial one you start to use this with
other file formats.
http://bugzilla.open-bio.org/show_bug.cgi?id=2294

I've been thinking about writing a unit test using the EMBOSS seqret
program for interconverting file formats, as a way of checking our
conversions against a third party.

> The problem is, it doesn't support something like:
> SeqIO.write(seq_record, out_file_handle, "genbank")
> Because it requires an iterable object I guess?

Yes, you would have to use this:
SeqIO.write([seq_record], out_file_handle, "genbank")

Of course, if lots of people really want to have the flexibility to
supply a SeqRecord or a SeqRecord list/iterator this would be
possible.  On the other hand, there is something to be said for a
simple fixed interface.

> And it has to write to a file handle for some reason, and
> won't just give me the string to do whatever I want with.

This is by design - the API uses handles and only handles.  If you
want a string containing the data, use StringIO (or cStringIO),
something like this:

from StringIO import StringIO
handle = StringIO()
SeqIO.write(seq_records, handle "fasta")
handle.seek(0)
data = handle.read()

This isn't in the tutorial or the wiki page (yet).
http://biopython.org/wiki/SeqIO

> I've done a lot of searching and mailing lists, and googling, and surely I
> must be missing something? What is the simplest way to get a string
> representing a genbank file, starting with a SeqRecord?
>
> I'm sort of shocked that there isn't some sort of SeqRecord.to_genbank()
> method.

We have discussed something like a SeqRecord.to_format() method (or
similar name), which would call Bio.SeqIO internally using StringIO
and return a string.  This fits in nicely with the planned __format__
and format() functionality in Python 2.6 and 3.0
http://www.python.org/dev/peps/pep-3101/

See http://portal.open-bio.org/pipermail/biopython-dev/2008-June/003793.html

Peter



More information about the Biopython mailing list