[Biopython-dev] [BioPython] a sequence set object in biopython?

Peter biopython at maubp.freeserve.co.uk
Wed Nov 12 18:06:19 UTC 2008


On Wed, Nov 12, 2008 at 5:53 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Wed, Nov 12, 2008 at 4:25 PM, Giovanni Marco Dall'Olio
> <dalloliogm at gmail.com> wrote:
>> Hi,
>> I think it could be useful to add a generic SequenceSet object in biopython.
>> Such an object would represent a generic set of sequences, and could
>> have some useful methods like .format('fasta') or
>> .align('alignment_tool').
>> Is there something similar available already?
>
> Given your example to turn the SequenceSet into a FASTA file, then
> clearly you are thinking of a collection of SeqRecord objects rather
> than just Seq objects.  For this kind of thing I personally just use a
> list of SeqRecord objects.
>
> If I want to turn a list of SeqRecord objects into a FASTA file, I can
> pass the list to the Bio.SeqIO.write() function.  Once I've made a
> FASTA file, I can call an external tool to align them - and then load
> them in again using Bio.AlignIO or Bio.SeqIO depending on what I plan
> to do next.

If you really want a list like object with a format method in your
code, how about something like this:

class SeqRecordList(list) :
    """Subclass of the python list, to hold SeqRecord objects only."""
    #TODO - Override the list methods to make sure all the items
    #are indeed SeqRecord objects

    def format(self, format) :
        """Returns a string of all the records in a requested file format.

        The argument format should be any file format supported by
        the Bio.SeqIO.write() function.  This must be a lower case string.
        """
        from Bio import SeqIO
        from StringIO import StringIO
        handle = StringIO()
        SeqIO.write(self, handle, format)
        handle.seek(0)
        return handle.read()

if __name__ == "__main__" :
    print "Loading records..."
    from Bio import SeqIO
    my_list = SeqRecordList(SeqIO.parse(open("ls_orchid.gbk"),"genbank"))
    print len(my_list)
    for format in ["fasta","tab"] :
        print
        print format
        print "="*len(format)
        print my_list.format(format)


Peter



More information about the Biopython-dev mailing list