[Biopython-dev] [BioPython] about the SeqRecord slicing

Jose Blanca jblanca at btc.upv.es
Fri Mar 27 04:22:27 EDT 2009


On Thursday 26 March 2009 16:32:23 Peter wrote:

> The SeqRecord __getitem__ would have to return a SeqRecord when given
> a single integer index, holding a single letter sequence.  What about
> the name/id/description and annotations (e.g. organism) - do they
> really apply to a single letter from the sequence?  Technically
> writing the code to offer this isn't such a problem, but I am
> unconvinced this is the best behaviour for normal usage.
You're right, I was not thinking on the rest of the properties because I don't 
need them. They're a problem when slicing and adding SeqRecords. But they're 
also a problem in standard slicing. Should the annotations be kept when the 
SeqRecord is sliced? Are they still relevant? None of the behaviours will be 
ok for all the cases.

> Also closely related to this, what would you expect __iter__ to
> iterate over?  Currently it acts like iteration over the record's
> sequence.
The SeqRecord can already hold a sequence of length one, so we have the same 
problem. In fact I could do seq_rec[n:n+1] and I would obtain the SeqRecord 
that I want. 

> You'd also want the SeqRecord to support __add__ (and __radd__) so
> that two SeqRecord objects can be added together.  I have thought
> about this before, and it is a *much* more complicated issue due to
> the meta data.  In general the only safe and unambiguous choice is to
> exclude it from the combined record:
> * sequence - just add (using normal rules for adding Seq objects)
> * name/id/description - if the two agree, use that?  Otherwise default
> to a blank value?
> * annotations - for each keyed value, you could combine the entries?
> Or just throwing them all away?
> * letter_annotations - if an entry is present in both you can combine
> it.  Otherwise throw them away?
> * features - these could be combined, adjusting the locations for one
> record's features as appropriate
As I said before I think that the same problem is presented when you do a 
slice. If I have the sequence of a gene named X with some annotations and I 
slice a part, is still be named geneX? Should the annotations be kept?

> I'm not ruling out adding SeqRecord addition, but I don't want to rush
> it while we are trying to get Biopython 1.50 done.
That's quite sensible. I think that is a good thing to discuss all this 
issues, I keep learning a lot from you.
Best regards,

-- 
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)


More information about the Biopython-dev mailing list