[Biopython-dev] [BioPython] about the SeqRecord slicing
Peter
biopython at maubp.freeserve.co.uk
Thu Mar 26 15:32:23 UTC 2009
On Thu, Mar 26, 2009 at 3:14 PM, Jose Blanca <jblanca at btc.upv.es> wrote:
> On Thursday 26 March 2009 16:07:33 Peter wrote:
>> However, if I understand you, when pulling a column from a SeqRecord
>> based alignment in addition to the column's sequence you'd like the get the
>> per-letter-annotations as well. This assumes that all the SeqRecord
>> objects in the alignment have the same per-letter-annotation present - some
>> might have quality and others might not! But how would you want to store
>> this new column object? Using a string or a Seq doesn't support any
>> annotation - you *could* use a SeqRecord with no id, name, description,
>> features, annotation - just a sequence and any common
>> per-letter-annotation. Is this what you had in mind?
>
> Yes, that's exactly what I have in mind. Do you see any problem with that
> approach?
Well yes. For your code to work on SeqRecord objects (based on the
verbal description earlier), it needs at least the following changes
to the SeqRecord:
The SeqRecord __getitem__ would have to return a SeqRecord when given
a single integer index, holding a single letter sequence. What about
the name/id/description and annotations (e.g. organism) - do they
really apply to a single letter from the sequence? Technically
writing the code to offer this isn't such a problem, but I am
unconvinced this is the best behaviour for normal usage.
Also closely related to this, what would you expect __iter__ to
iterate over? Currently it acts like iteration over the record's
sequence.
You'd also want the SeqRecord to support __add__ (and __radd__) so
that two SeqRecord objects can be added together. I have thought
about this before, and it is a *much* more complicated issue due to
the meta data. In general the only safe and unambiguous choice is to
exclude it from the combined record:
* sequence - just add (using normal rules for adding Seq objects)
* name/id/description - if the two agree, use that? Otherwise default
to a blank value?
* annotations - for each keyed value, you could combine the entries?
Or just throwing them all away?
* letter_annotations - if an entry is present in both you can combine
it. Otherwise throw them away?
* features - these could be combined, adjusting the locations for one
record's features as appropriate
I'm not ruling out adding SeqRecord addition, but I don't want to rush
it while we are trying to get Biopython 1.50 done.
Peter
More information about the Biopython-dev
mailing list