[Biopython-dev] MAF Parser/Writer/Indexer
Peter Cock
p.j.a.cock at googlemail.com
Mon May 16 17:58:23 UTC 2011
On Mon, May 16, 2011 at 6:26 PM, Andrew Sczesnak
<andrew.sczesnak at med.nyu.edu> wrote:
> On 05/16/2011 07:14 AM, Peter Cock wrote:
>>
>> Do you think we should follow the speciesOrder directive if
>> present?
>
> Yeah, why not. I started working on this and the problem was, as defined in
> the spec, the species is just "hg19" or "mm9," yet the records are in
> species.chromosome format. Should we enforce that the species in a
> speciesOrder directive must exactly match a sequence identifier, or add a
> split and do some checks to make sure a record matches only one species in
> speciesOrder?
That is a subtlety I missed - maybe it is simpler to ignore speciesOrder
after all. I presume it is intended a graphical output directive really.
>> Also I think we may need to do something rigorous with start/end
>> co-ordinates and strand in either the Seq or SeqRecord object.
>> They could be updated automatically during slicing and taking
>> reverse complement... they might not survive addition though.
>
> This is interesting. I wonder if it makes sense to preserve this
> information if a SeqRecord is going to be maniuplated outside a
> MultipleSeqAlignment object. Could this be accomplished by
> migrating the annotation information to a SeqFeature?
I'm not sure how using a SeqFeature would work here.
Also consider that someone might manipulate the alignment
directly, e.g. alignment[:,10:60] to pull out fifty columns. That
seems like a use case where the start/end co-ordinates should
be updated nicely. Note that internally this calls record[10:60]
for each row of the alignment, so using SeqRecord objects.
Peter
More information about the Biopython-dev
mailing list