[Biopython-dev] MAF Parser/Writer/Indexer
Peter Cock
p.j.a.cock at googlemail.com
Mon May 16 20:53:55 UTC 2011
On Mon, May 16, 2011 at 9:33 PM, Andrew Sczesnak wrote:
> On 05/16/2011 01:58 PM, Peter Cock wrote:
>>
>> That is a subtlety I missed - maybe it is simpler to ignore speciesOrder
>> after all. I presume it is intended a graphical output directive really.
>
> Fine by me. If need be we can add this later.
>
>>> This is interesting. I wonder if it makes sense to preserve this
>>> information if a SeqRecord is going to be maniuplated outside a
>>> MultipleSeqAlignment object. Could this be accomplished by
>>> migrating the annotation information to a SeqFeature?
>>
>> I'm not sure how using a SeqFeature would work here.
>
> Hmm, well, strand is manipulated in a SeqFeature when .reverse_complement()
> is run, right? I thought that might take care of that. Though truthfully I
> haven't looked too much at that code.
The SeqFeature is for describing (part of) a SeqRecord, and both
have a reverse_complement method for when you want to flip the
sequence and all the features on it.
>> Also consider that someone might manipulate the alignment
>> directly, e.g. alignment[:,10:60] to pull out fifty columns. That
>> seems like a use case where the start/end co-ordinates should
>> be updated nicely. Note that internally this calls record[10:60]
>> for each row of the alignment, so using SeqRecord objects.
>
> That's true. Is there a more general way to implement this? By dragging
> the coordinate information out of .annotations and into fields that aren't
> MAF-specific or something.
That's what I was suggesting - the existing fasta-m10 parser can
also collect start/end/strand information, and there are obvious
potential uses with things like BLAST and HMMER too. One idea
might be to introduce a SeqRecord subclass - I'm not sure yet.
Peter
More information about the Biopython-dev
mailing list