[Biopython-dev] SeqRecord to file format as string

Peter biopython at maubp.freeserve.co.uk
Wed Jun 18 14:00:56 UTC 2008


This is returning to a thread last year, about getting a SeqRecord
into a string in a particular file format (e.g. fasta).  Jared Flatow
had suggest adding a method to the SeqRecord itself.

Jared wrote:
>  > ... To always have to write to a file feels strange, but I see
>  > that it would be messy to go OO since there are so many formats.
>  > However, giving preference to fasta over other formats by making it
>  > innate doesn't seem like such a terrible idea. I do have mixed
>  > feelings about 'bloating' the code which is why I asked, and you have
>  > convinced me that this is not quite appropriate given existing
>  > convention. However the idea would be to put the to_fasta or
>  > to_format method inside the SeqRecord, then to call it from the IO
>  > when needed to actually write to a file, but call it directly when
>  > all that is wanted is a string...
>
> Its debatable isn't it?  I suspect that for most users, when they want a
> record in a particular file format its for writing to a file.  However,
> adding a to_format() method to a SeqRecord some sense (suitable for
> sequential file formats only).  This would take a format name and return
> a string, by calling Bio.SeqIO with a StringIO object internally.
>
> Peter

Jared - On reflection, do you think adding a method like this to the
SeqRecord (or even just for the FASTA format) would be useful?

I recently found myself wanting to use this sort of functionality, and
remembered this old thread.  This time I was wondering about using the
method name tostring (matching the name of a Seq object method).  In
order to mimic the Seq object's method, the format would be optional
and when omitted would give the sequence as a string.  Otherwise one
of the lower case strings used in Bio.SeqIO should be supplied.  There
is a sample implementation at the end of this email.

On Wed, Oct 17, 2007 Michiel De Hoon wrote:
> How about the following:
>
> SeqIO.write(sequences, handle, format) returns the properly formatted string
> if handle==None.

I can see the above is simpler than having to supply a StringIO
handle, but it doesn't make the functionality available directly from
the SeqRecord object.  It also complicates the API of the SeqIO module
with a special case.

Peter

--

######################################
For the SeqRecord class, in Bio/SeqRecord.py
######################################
    def tostring(self, format=None) :
        """Returns the record as a string in the specified file format.

        If the file format is omitted (default), the sequence itself is
        returned as a string.

        Otherwise the format should be a lower case string supported by
        Bio.SeqIO, which is used to turn the SeqRecord into a string."""
        if format :
            from StringIO import StringIO
            from Bio import SeqIO
            handle = StringIO()
            SeqIO.write([self], handle, format)
            handle.seek(0)
            return handle.read()
        else :
            #Return the sequence as a string
            return self.seq.tostring()
############################################




More information about the Biopython-dev mailing list