[BioPython] a sequence set object in biopython?
Peter
biopython at maubp.freeserve.co.uk
Wed Nov 12 13:06:19 EST 2008
On Wed, Nov 12, 2008 at 5:53 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Wed, Nov 12, 2008 at 4:25 PM, Giovanni Marco Dall'Olio
> <dalloliogm at gmail.com> wrote:
>> Hi,
>> I think it could be useful to add a generic SequenceSet object in biopython.
>> Such an object would represent a generic set of sequences, and could
>> have some useful methods like .format('fasta') or
>> .align('alignment_tool').
>> Is there something similar available already?
>
> Given your example to turn the SequenceSet into a FASTA file, then
> clearly you are thinking of a collection of SeqRecord objects rather
> than just Seq objects. For this kind of thing I personally just use a
> list of SeqRecord objects.
>
> If I want to turn a list of SeqRecord objects into a FASTA file, I can
> pass the list to the Bio.SeqIO.write() function. Once I've made a
> FASTA file, I can call an external tool to align them - and then load
> them in again using Bio.AlignIO or Bio.SeqIO depending on what I plan
> to do next.
If you really want a list like object with a format method in your
code, how about something like this:
class SeqRecordList(list) :
"""Subclass of the python list, to hold SeqRecord objects only."""
#TODO - Override the list methods to make sure all the items
#are indeed SeqRecord objects
def format(self, format) :
"""Returns a string of all the records in a requested file format.
The argument format should be any file format supported by
the Bio.SeqIO.write() function. This must be a lower case string.
"""
from Bio import SeqIO
from StringIO import StringIO
handle = StringIO()
SeqIO.write(self, handle, format)
handle.seek(0)
return handle.read()
if __name__ == "__main__" :
print "Loading records..."
from Bio import SeqIO
my_list = SeqRecordList(SeqIO.parse(open("ls_orchid.gbk"),"genbank"))
print len(my_list)
for format in ["fasta","tab"] :
print
print format
print "="*len(format)
print my_list.format(format)
Peter
More information about the BioPython
mailing list