[Biopython-dev] [BioPython] about the SeqRecord slicing
Jose Blanca
jblanca at btc.upv.es
Fri Mar 27 08:22:27 UTC 2009
On Thursday 26 March 2009 16:32:23 Peter wrote:
> The SeqRecord __getitem__ would have to return a SeqRecord when given
> a single integer index, holding a single letter sequence. What about
> the name/id/description and annotations (e.g. organism) - do they
> really apply to a single letter from the sequence? Technically
> writing the code to offer this isn't such a problem, but I am
> unconvinced this is the best behaviour for normal usage.
You're right, I was not thinking on the rest of the properties because I don't
need them. They're a problem when slicing and adding SeqRecords. But they're
also a problem in standard slicing. Should the annotations be kept when the
SeqRecord is sliced? Are they still relevant? None of the behaviours will be
ok for all the cases.
> Also closely related to this, what would you expect __iter__ to
> iterate over? Currently it acts like iteration over the record's
> sequence.
The SeqRecord can already hold a sequence of length one, so we have the same
problem. In fact I could do seq_rec[n:n+1] and I would obtain the SeqRecord
that I want.
> You'd also want the SeqRecord to support __add__ (and __radd__) so
> that two SeqRecord objects can be added together. I have thought
> about this before, and it is a *much* more complicated issue due to
> the meta data. In general the only safe and unambiguous choice is to
> exclude it from the combined record:
> * sequence - just add (using normal rules for adding Seq objects)
> * name/id/description - if the two agree, use that? Otherwise default
> to a blank value?
> * annotations - for each keyed value, you could combine the entries?
> Or just throwing them all away?
> * letter_annotations - if an entry is present in both you can combine
> it. Otherwise throw them away?
> * features - these could be combined, adjusting the locations for one
> record's features as appropriate
As I said before I think that the same problem is presented when you do a
slice. If I have the sequence of a gene named X with some annotations and I
slice a part, is still be named geneX? Should the annotations be kept?
> I'm not ruling out adding SeqRecord addition, but I don't want to rush
> it while we are trying to get Biopython 1.50 done.
That's quite sensible. I think that is a good thing to discuss all this
issues, I keep learning a lot from you.
Best regards,
--
Jose M. Blanca Postigo
Instituto Universitario de Conservacion y
Mejora de la Agrodiversidad Valenciana (COMAV)
Universidad Politecnica de Valencia (UPV)
Edificio CPI (Ciudad Politecnica de la Innovacion), 8E
46022 Valencia (SPAIN)
Tlf.:+34-96-3877000 (ext 88473)
More information about the Biopython-dev
mailing list