[Biopython-dev] about the SeqRecord and SeqFeature classes

Peter biopython at maubp.freeserve.co.uk
Fri Sep 26 05:50:39 EDT 2008


On Thu, Sep 25, 2008 at 3:49 PM, Jose Blanca <jblanca at btc.upv.es> wrote:
> Hi:
>
> On Tuesday 23 September 2008 16:37:29 Peter wrote:
>> > SeqRecord still doesn't have a __getitem__ method.
>>
>> What do you think of the __getitem__ method proposed in attachment 942
>> on Bug 2507?
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2507
> I've been looking at the patch and is just what I need.
> Using a SeqRecord with that __getitem__ method is almost trivial.

Good :)

I'd like to check this into CVS but it would be best to have a third
person comment on the code first.

Once (if) this is included, I would then plan to use this for slicing
alignment objects (Bug 2551)
http://bugzilla.open-bio.org/show_bug.cgi?id=2551

> Attach to this email inside mySeqRecord.py is a possible implementation.
> What do you think? For the qualities a tuple of ints would do.

I see you have created a subclass the SeqRecord to add a quality
property, and made sure this gets sliced too in the __getitem__.  This
is a nice approach (and demonstrates how people could extend the basic
Biopython objects in their own code).  I would also suggest in the
__init__ method checking that the quality sequence is the same length
as the sequence itself.  Your code looks like it would cope with any
python sequence object (string, list, tuple) for the quality, and you
could use integers or floats here.  Very flexible.

If we were to add something like this to Biopython directly, I prefer
"quality" over "qual" (just three letters longer but much clearer).  I
would also consider adding the quality to the Seq object (subclassing
the Seq object rather than the SeqRecord object).  My reasoning is
that for 454 or Solexa sequencing, you will have thousands of reads
and all you really care about is the nucleotide sequence and the
quality scores.  Unless you want to give them all unique names, there
little point having the overhead of the various annotation properties
of the SeqRecord.

> For implementing some details new style classes would be better. Are you
> planning to move Seq and SeqRecord to the new style?

If we have a good reason to - adding docstrings to the properties would be nice.

Peter


More information about the Biopython-dev mailing list