[BioPython] a sequence set object in biopython?

Peter biopython at maubp.freeserve.co.uk
Wed Nov 12 17:53:35 UTC 2008


On Wed, Nov 12, 2008 at 4:25 PM, Giovanni Marco Dall'Olio
<dalloliogm at gmail.com> wrote:
> Hi,
> I think it could be useful to add a generic SequenceSet object in biopython.
> Such an object would represent a generic set of sequences, and could
> have some useful methods like .format('fasta') or
> .align('alignment_tool').
> Is there something similar available already?

Given your example to turn the SequenceSet into a FASTA file, then
clearly you are thinking of a collection of SeqRecord objects rather
than just Seq objects.  For this kind of thing I personally just use a
list of SeqRecord objects.

If I want to turn a list of SeqRecord objects into a FASTA file, I can
pass the list to the Bio.SeqIO.write() function.  Once I've made a
FASTA file, I can call an external tool to align them - and then load
them in again using Bio.AlignIO or Bio.SeqIO depending on what I plan
to do next.

> I have noticed that the actual Generic.Alignment is very similar to
> such an object. However, it would be better to be able to work with a
> separated class, because sometimes you want to deal with sequences
> that are not aligned.

Yes, the generic alignment is basically a list of SeqRecord objects
plus some extra functionality like column access.

> Some use cases:
> - a set of sequences that represents all introns in a particular gene,
> on which I want to calculate the conservation of the splicing
> regulatory sites.
> - all genes sequences in an organisms, which I want to convert in EMBL format
> - a set of seqs to be aligned or used as input for other tools
> etc..

All sensible use cases - but all seem to be covered by a simple python
list of SeqRecord objects, or in some cases a list of Seq objects
(e.g. the introns example, as I doube the introns have names).

Peter



More information about the Biopython mailing list