[Biopython-dev] [BioPython] about the SeqRecord slicing

Peter biopython at maubp.freeserve.co.uk
Thu Mar 26 15:32:23 UTC 2009


On Thu, Mar 26, 2009 at 3:14 PM, Jose Blanca <jblanca at btc.upv.es> wrote:
> On Thursday 26 March 2009 16:07:33 Peter wrote:
>> However, if I understand you, when pulling a column from a SeqRecord
>> based alignment in addition to the column's sequence you'd like the get the
>> per-letter-annotations as well.  This assumes that all the SeqRecord
>> objects in the alignment have the same per-letter-annotation present - some
>> might have quality and others might not!  But how would you want to store
>> this new column object?  Using a string or a Seq doesn't support any
>> annotation - you *could* use a SeqRecord with no id, name, description,
>> features, annotation - just a sequence and any common
>> per-letter-annotation.  Is this what you had in mind?
>
> Yes, that's exactly what I have in mind. Do you see any problem with that
> approach?

Well yes.  For your code to work on SeqRecord objects (based on the
verbal description earlier), it needs at least the following changes
to the SeqRecord:

The SeqRecord __getitem__ would have to return a SeqRecord when given
a single integer index, holding a single letter sequence.  What about
the name/id/description and annotations (e.g. organism) - do they
really apply to a single letter from the sequence?  Technically
writing the code to offer this isn't such a problem, but I am
unconvinced this is the best behaviour for normal usage.

Also closely related to this, what would you expect __iter__ to
iterate over?  Currently it acts like iteration over the record's
sequence.

You'd also want the SeqRecord to support __add__ (and __radd__) so
that two SeqRecord objects can be added together.  I have thought
about this before, and it is a *much* more complicated issue due to
the meta data.  In general the only safe and unambiguous choice is to
exclude it from the combined record:
* sequence - just add (using normal rules for adding Seq objects)
* name/id/description - if the two agree, use that?  Otherwise default
to a blank value?
* annotations - for each keyed value, you could combine the entries?
Or just throwing them all away?
* letter_annotations - if an entry is present in both you can combine
it.  Otherwise throw them away?
* features - these could be combined, adjusting the locations for one
record's features as appropriate

I'm not ruling out adding SeqRecord addition, but I don't want to rush
it while we are trying to get Biopython 1.50 done.

Peter




More information about the Biopython-dev mailing list