[Biopython-dev] about the SeqRecord and SeqFeature classes

Jose Blanca jblanca at btc.upv.es
Fri Sep 26 13:18:02 UTC 2008


Hi:

> I see you have created a subclass the SeqRecord to add a quality
> property, and made sure this gets sliced too in the __getitem__.  This
> is a nice approach (and demonstrates how people could extend the basic
> Biopython objects in their own code).  I would also suggest in the
> __init__ method checking that the quality sequence is the same length
> as the sequence itself.
To do that in a proper way I would like to use property, that's why I was 
asking for the possibility of transforming SeqRecord and Seq in new style 
classes.

> If we were to add something like this to Biopython directly, I prefer
> "quality" over "qual" (just three letters longer but much clearer). 
That's not a problem. I used qual to do it similar to .seq

> I would also consider adding the quality to the Seq object (subclassing
> the Seq object rather than the SeqRecord object).  My reasoning is
> that for 454 or Solexa sequencing, you will have thousands of reads
> and all you really care about is the nucleotide sequence and the
> quality scores.  Unless you want to give them all unique names, there
> little point having the overhead of the various annotation properties
> of the SeqRecord.
I didn't subclass Seq because if we want a quality without name we could just 
use a tuple or a list. My idea was to create a class with two main 
properties, seq and qual (or quality). Seq does not has a seq property, it is 
a sequence. Since SeqRecord already has a seq property I subclassed it adding 
the qual property. Another alternative would be to create a new 
SeqWithQuality class without subclassing SeqRecord.
I looked at the BioPerl model. They have several classes dealing with 
sequences and qualities:
Seq: - has a seq property (unlike BioPython's Seq that is a sequence and has 
no seq property). Besides has and id or a name.
Qual: - has a qual property, and an id or a name.
SeqWithQual: - has a seq and Qual properties.
I didn't create a Qual class with a qual property and a name because there is 
no Seq class with a seq an a name. I thought that a tuple or a list of ints 
would be equivalent to BioPython's Seq and would take the part of the 
BioPerl's Qual.
What do you think about this model?
I agree that this classes should be prepared to deal with a lot of sequences 
and they should be efficient. But I don't have the experience to foresee 
which model would be better in that regard.


-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)



More information about the Biopython-dev mailing list