[Biopython-dev] about the SeqRecord and SeqFeature classes
Jose Blanca
jblanca at btc.upv.es
Fri Sep 26 09:18:02 EDT 2008
Hi:
> I see you have created a subclass the SeqRecord to add a quality
> property, and made sure this gets sliced too in the __getitem__. This
> is a nice approach (and demonstrates how people could extend the basic
> Biopython objects in their own code). I would also suggest in the
> __init__ method checking that the quality sequence is the same length
> as the sequence itself.
To do that in a proper way I would like to use property, that's why I was
asking for the possibility of transforming SeqRecord and Seq in new style
classes.
> If we were to add something like this to Biopython directly, I prefer
> "quality" over "qual" (just three letters longer but much clearer).
That's not a problem. I used qual to do it similar to .seq
> I would also consider adding the quality to the Seq object (subclassing
> the Seq object rather than the SeqRecord object). My reasoning is
> that for 454 or Solexa sequencing, you will have thousands of reads
> and all you really care about is the nucleotide sequence and the
> quality scores. Unless you want to give them all unique names, there
> little point having the overhead of the various annotation properties
> of the SeqRecord.
I didn't subclass Seq because if we want a quality without name we could just
use a tuple or a list. My idea was to create a class with two main
properties, seq and qual (or quality). Seq does not has a seq property, it is
a sequence. Since SeqRecord already has a seq property I subclassed it adding
the qual property. Another alternative would be to create a new
SeqWithQuality class without subclassing SeqRecord.
I looked at the BioPerl model. They have several classes dealing with
sequences and qualities:
Seq: - has a seq property (unlike BioPython's Seq that is a sequence and has
no seq property). Besides has and id or a name.
Qual: - has a qual property, and an id or a name.
SeqWithQual: - has a seq and Qual properties.
I didn't create a Qual class with a qual property and a name because there is
no Seq class with a seq an a name. I thought that a tuple or a list of ints
would be equivalent to BioPython's Seq and would take the part of the
BioPerl's Qual.
What do you think about this model?
I agree that this classes should be prepared to deal with a lot of sequences
and they should be efficient. But I don't have the experience to foresee
which model would be better in that regard.
--
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)
More information about the Biopython-dev
mailing list