[Biopython-dev] SeqRecord to file format as string
Peter
biopython at maubp.freeserve.co.uk
Wed Jun 18 14:00:56 UTC 2008
This is returning to a thread last year, about getting a SeqRecord
into a string in a particular file format (e.g. fasta). Jared Flatow
had suggest adding a method to the SeqRecord itself.
Jared wrote:
> > ... To always have to write to a file feels strange, but I see
> > that it would be messy to go OO since there are so many formats.
> > However, giving preference to fasta over other formats by making it
> > innate doesn't seem like such a terrible idea. I do have mixed
> > feelings about 'bloating' the code which is why I asked, and you have
> > convinced me that this is not quite appropriate given existing
> > convention. However the idea would be to put the to_fasta or
> > to_format method inside the SeqRecord, then to call it from the IO
> > when needed to actually write to a file, but call it directly when
> > all that is wanted is a string...
>
> Its debatable isn't it? I suspect that for most users, when they want a
> record in a particular file format its for writing to a file. However,
> adding a to_format() method to a SeqRecord some sense (suitable for
> sequential file formats only). This would take a format name and return
> a string, by calling Bio.SeqIO with a StringIO object internally.
>
> Peter
Jared - On reflection, do you think adding a method like this to the
SeqRecord (or even just for the FASTA format) would be useful?
I recently found myself wanting to use this sort of functionality, and
remembered this old thread. This time I was wondering about using the
method name tostring (matching the name of a Seq object method). In
order to mimic the Seq object's method, the format would be optional
and when omitted would give the sequence as a string. Otherwise one
of the lower case strings used in Bio.SeqIO should be supplied. There
is a sample implementation at the end of this email.
On Wed, Oct 17, 2007 Michiel De Hoon wrote:
> How about the following:
>
> SeqIO.write(sequences, handle, format) returns the properly formatted string
> if handle==None.
I can see the above is simpler than having to supply a StringIO
handle, but it doesn't make the functionality available directly from
the SeqRecord object. It also complicates the API of the SeqIO module
with a special case.
Peter
--
######################################
For the SeqRecord class, in Bio/SeqRecord.py
######################################
def tostring(self, format=None) :
"""Returns the record as a string in the specified file format.
If the file format is omitted (default), the sequence itself is
returned as a string.
Otherwise the format should be a lower case string supported by
Bio.SeqIO, which is used to turn the SeqRecord into a string."""
if format :
from StringIO import StringIO
from Bio import SeqIO
handle = StringIO()
SeqIO.write([self], handle, format)
handle.seek(0)
return handle.read()
else :
#Return the sequence as a string
return self.seq.tostring()
############################################
More information about the Biopython-dev
mailing list