[BioPython] a sequence set object in biopython?

Peter biopython at maubp.freeserve.co.uk
Wed Nov 12 18:36:11 UTC 2008


Giovanni Marco Dall'Olio wrote:
>> All sensible use cases - but all seem to be covered by a simple python
>> list of SeqRecord objects, or in some cases a list of Seq objects
>> (e.g. the introns example, as I doube the introns have names).
>
> Not always.
> For example, if I have a set of genes in an organism, sometimes I
> would need to access to only some of them, by their id; so, a
> __getattribute__ method to make it work as a dictionary could also be
> useful.

OK, then use a dict of SeqRecords for this, as shown in the tutorial
chapter for Bio.SeqIO and the wiki.  We even have a helper function
Bio.SeqIO.to_dict() to do this and check for duplicate keys.

If you need an order preserving dictionary, there are examples of this
on the net and there is even PEP372 for adding this to python itself:
http://www.python.org/dev/peps/pep-0372/

> The fact is that I think that such an object would be so widely used,
> that maybe it would be useful to implement it in biopython.
> What I would do, honestly, is to create a GenericSeqRecordSet class
> from which to derive Alignment, specifying that in an alignment all
> the sequences should have the same lenght. It would not require much
> work and it would change the interface.

I agree that IF we added some sort of "GenericSeqRecordSet class", it
might be sensible for the alignment objects to subclass it -
especially if you want it to behave list a python list primarily.
Note that in python sets are not order preserving.

> very tiny little minusculus p.s. if you need help for implement such a
> thing or anything else I can volounteer :).

That's good to hear :)

However, we'd have to establish the need for this new object first -
but so far we've only had two people's view so its too early to form a
consensus.  I don't see a strong reason for adding yet another object,
when the core language provides lists, sets and dict which seem to be
enough.

Peter



More information about the Biopython mailing list