[BioPython] a sequence set object in biopython?

Giovanni Marco Dall'Olio dalloliogm at gmail.com
Wed Nov 12 13:17:48 EST 2008


On Wed, Nov 12, 2008 at 6:53 PM, Peter <biopython at maubp.freeserve.co.uk> wrote:
> On Wed, Nov 12, 2008 at 4:25 PM, Giovanni Marco Dall'Olio
> <dalloliogm at gmail.com> wrote:
>> Hi,
>> I think it could be useful to add a generic SequenceSet object in biopython.
>> Such an object would represent a generic set of sequences, and could
>> have some useful methods like .format('fasta') or
>> .align('alignment_tool').
>> Is there something similar available already?
>
> Given your example to turn the SequenceSet into a FASTA file, then
> clearly you are thinking of a collection of SeqRecord objects rather
> than just Seq objects.  For this kind of thing I personally just use a
> list of SeqRecord objects.
>
> If I want to turn a list of SeqRecord objects into a FASTA file, I can
> pass the list to the Bio.SeqIO.write() function.  Once I've made a
> FASTA file, I can call an external tool to align them - and then load
> them in again using Bio.AlignIO or Bio.SeqIO depending on what I plan
> to do next.
>
>> Some use cases:
>> - a set of sequences that represents all introns in a particular gene,
>> on which I want to calculate the conservation of the splicing
>> regulatory sites.
>> - all genes sequences in an organisms, which I want to convert in EMBL format
>> - a set of seqs to be aligned or used as input for other tools
>> etc..
>
> All sensible use cases - but all seem to be covered by a simple python
> list of SeqRecord objects, or in some cases a list of Seq objects
> (e.g. the introns example, as I doube the introns have names).
>

Not always.
For example, if I have a set of genes in an organism, sometimes I
would need to access to only some of them, by their id; so, a
__getattribute__ method to make it work as a dictionary could also be
useful.
The fact is that I think that such an object would be so widely used,
that maybe it would be useful to implement it in biopython.
What I would do, honestly, is to create a GenericSeqRecordSet class
from which to derive Alignment, specifying that in an alignment all
the sequences should have the same lenght. It would not require much
work and it would change the interface.


very tiny little minusculus p.s. if you need help for implement such a
thing or anything else I can volounteer :).

> Peter
>



-- 
-----------------------------------------------------------

My Blog on Bioinformatics (italian): http://bioinfoblog.it


More information about the BioPython mailing list