[Biopython-dev] MAF Parser/Writer/Indexer

Mon May 16 20:53:55 UTC 2011

On Mon, May 16, 2011 at 9:33 PM, Andrew Sczesnak wrote:
> On 05/16/2011 01:58 PM, Peter Cock wrote:
>>
>> That is a subtlety I missed - maybe it is simpler to ignore speciesOrder
>> after all. I presume it is intended a graphical output directive really.
>
> Fine by me.  If need be we can add this later.
>
>>> This is interesting.  I wonder if it makes sense to preserve this
>>> information if a SeqRecord is going to be maniuplated outside a
>>> MultipleSeqAlignment object.  Could this be accomplished by
>>> migrating the annotation information to a SeqFeature?
>>
>> I'm not sure how using a SeqFeature would work here.
>
> Hmm, well, strand is manipulated in a SeqFeature when .reverse_complement()
> is run, right?  I thought that might take care of that.  Though truthfully I
> haven't looked too much at that code.

The SeqFeature is for describing (part of) a SeqRecord, and both
have a reverse_complement method for when you want to flip the
sequence and all the features on it.

>> Also consider that someone might manipulate the alignment
>> directly, e.g. alignment[:,10:60] to pull out fifty columns. That
>> seems like a use case where the start/end co-ordinates should
>> be updated nicely. Note that internally this calls record[10:60]
>> for each row of the alignment, so using SeqRecord objects.
>
> That's true.  Is there a more general way to implement this?  By dragging
> the coordinate information out of .annotations and into fields that aren't
> MAF-specific or something.

That's what I was suggesting - the existing fasta-m10 parser can
also collect start/end/strand information, and there are obvious
potential uses with things like BLAST and HMMER too. One idea
might be to introduce a SeqRecord subclass - I'm not sure yet.

Peter