[Biopython-dev] SeqRecord to file format as string
Jared Flatow
jflatow at northwestern.edu
Fri Jun 20 12:16:10 EDT 2008
On Jun 20, 2008, at 9:42 AM, Peter wrote:
> On Wed, Jun 18, 2008 at 4:16 PM, Jared Flatow <jflatow at northwestern.edu
> > wrote:
>> However, py3k and 2.6 will make available the functionality
>> described in PEP
>> 3101:
>>
>> http://www.python.org/dev/peps/pep-3101/
>>
>> I think it would be best to define some semantics that are
>> compatible with
>> this PEP.
>
> That is interesting - the PEP has been accepted, but I guess we should
> wait and see exactly what python 2.6 and 3.0 end up using before
> trying to integrate this into the SeqRecord.
I agree, there's a couple of things that may still change, but the
betas for 2.6 and 3.0 are out and that PEP has been around a while so
I would say it's pretty much stable. At least as far as how the
general mechanism will work, I don't believe that is likely to change.
>> In short, I think creating methods to return formatted versions of
>> objects
>> (SeqRecords) is a good idea, but most especially if it is done in a
>> way
>> consistent with the language's vision.
>
> That does sound wise - but I'm a little hazy on how exactly PEP-3101
> will work in practice for generic complex objects.
Yes I had to read it a few times through to understand how exactly it
will work, here is what I know:
All objects now get the __format__ method which has a signature like
this:
def __format__(self, format_spec):
# return a formatted string
The format_spec (format specifier) can be defined by the object, so
essentially it's totally customizable (if you want to do really crazy
things there is a Formatter that can be messed with, but we should and
can avoid this). This object method works like other customizable
python methods, and there's a corresponding builtin, so calling
format(obj, "the format specifier") will simply call
obj.__format__(self, "the format specifier"). Thus we can define the
format_spec for a SeqRecord to differentiate between FASTA and
whatever other formats we want to define.
The string class is also getting a .format method which just calls
the .__format__ method in an OO way instead of using the builtin. We
can do the same thing, and it seems like most use cases will be to
call seq_rec.format('fasta'). All this works for all python versions,
except you typically can't call it using format(seq_rec, 'fasta')
except in 2.6 or 3.0.
Besides the builtin format, we gain the ability to embed the format
within other strings. So, using the implementation you provided
earlier which just returns the underlying Seq as a string if no format
is specified, we might define the __format__ method like this:
def __format__(self, format_spec=None):
if format_spec:
from StringIO import StringIO
from Bio import SeqIO
handle = StringIO()
SeqIO.write([self], handle, format)
handle.seek(0)
return handle.read()
return str(self)
def __str__(self):
return str(self.seq)
Now that means I can also embed this in formatted strings, like so:
"this is my sequence: {0}".format(seq_rec)
Or:
"this is my sequence in fasta format: {0:fasta}".format(seq_rec)
All in all, its pretty much what you'd expect (and the same as what
you had before). There's only a few small benefits we get for doing it
this way (right now), but I don't think we can go wrong using the
__format__ method like it was meant to be used, and who knows what
future use cases this may simplify.
jared
More information about the Biopython-dev
mailing list