[Biopython-dev] sequence class proposal
Jose Blanca
jblanca at btc.upv.es
Thu May 22 07:30:52 UTC 2008
Dear Biopython developers,
I've been using python and Biopython for some time now and I would like to
talk with you about the sequence classes in Biopython. I have had some issues
using the SeqRecord and Alignment classes and I have being discussing and
implementing with two students (Victor Sanchez y Pablo Martinez) a proposal
of a new sequence class. We would like to present this implementation as a
tip in the discussion about the design of the sequence classes in Biopython
and we're eager to receive your comments.
The first problem that I found with the SeqRecord is the lack of support for
qualities. And it is also difficult to implement this quality support in a
SeqRecord derived class. There's a problem with the current SeqRecord API
that difficults this. Let me explain it.
Currently SeqRecord has a seq property and if you want an slice or if you need
to reverse or complement you would do something like:
my_seq = SeqRecord()
my_seq.seq = Seq('ACTG')
my_seq.seq[0:2]
my_seq.seq = my_seq.reverse()
If I derive a class from SeqRecord with a qual property I don't know how to
reverse both the sequence and the quality at the same time, because now the
Seq methods are called directly without SeqRecord being aware of that. In
order to support that we have discuss a new class with a slightly different
API and we have done a preliminary implementation. We have named this new
class as RichSeq, and we think that this could solve the quality problem.
With this new class it would work like this:
myseq = RichSeq(seq='ACTG', qual=[50,50,50,50])
subseq = myseq[0:2]
myseq.reverse()
myseq.complement()
RichSeq is equivalent to SeqRecord and it has the same properties as
SeqRecord, but it adds the methods __getitem__, reverse, complement and
reverse_complement.
We have also implemented a new type of features, we have called them
RichFeature. They are similar to the SeqFeature. The main difference is that
instead of a location and a location operator, they have a BioRange (another
new class). This BioRange is inspired/copied from the Bioperl library. The
BioRange is optional, so some RichFeature uses would be:
RichFeature(id='a_feature', type='annotation', feature='this is an
annotation')
RichFeature(id='a_feature', type='subsequence', feature=Seq('ACTG'))
range = BioRange(start=3,end=6)
feat = RichFeature(type='annotation', range=range, feature='some_annotation,
e.g. an exon')
seq = RichSeq(seq='ACTGACTG', features=[feat])
With this implementation you can define a sequence with seq, qual and
annotations associated with a range in a easy way, and after that you can
reverse and complement them in a trivial way.
range = BioRange(start=3,end=6)
feat = RichFeature(type='annotation', range=range, feature='some_annotation')
seq = RichSeq(seq='ACTGACTG', qual=[60,60,60,60,60,60,60,60], features=[feat])
seq.reverse()
By the way, this is a mutable class, although that could be easily changed.
You can even use Seqs and RichSeq as subsequences and ask for slices or
complements.
range = BioRange(start=1,end=2)
feat = RichFeature(type='subsequence', feature=RichSeq(seq='CT'), range=range)
seq = RichSeq(seq='ACTG', features=[feat])
seq2 = seq[1:2]
seq.reverse()
This capability makes this RichSeq an excellent candidate for a base class for
an Alignment implementation, but we have not implemented this yet.
Attach to this mail you can find the implementation of this new classes. They
have some tests that provide an idea about their intended use. We would like
to know about your opinions and suggestions. Do you think that this kind of
functionality is desirable? Please let us know about any flaw, specially in
the API. I think that my work would be easier using a sequence class similar
to RichSeq, but maybe there's an easier way.
Do you think that is a good idea to attach this classes to bugzilla? Do we
open a new bug or there's one for this sequence class debate already open?
Best regards,
--
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)
-------------- next part --------------
A non-text attachment was scrubbed...
Name: richseq.0.0.1.tar.gz
Type: application/x-tgz
Size: 7075 bytes
Desc: not available
URL: <http://lists.open-bio.org/pipermail/biopython-dev/attachments/20080522/aba24889/attachment-0002.bin>
More information about the Biopython-dev
mailing list