[Biopython-dev] MAF Parser/Indexer
Andrew Sczesnak
andrew.sczesnak at med.nyu.edu
Tue Apr 3 00:33:51 UTC 2012
Hi Peter,
Thank you for the feedback. I will try to make sure this code is well
tested before the next release.
> Is there any more about reverse complemented sequences
> and how they are handled, for in simple iterators, but more
> so when indexing? What I'm getting at here is the non-typical
> treatment of start and end being relative to the reverse
> complemented sequence for minus strand alignments. Here
> most tools/formats always count from the first base on the
> forward strand.
I'm not sure I'm understanding you, but I hope I am. In theory it seems
like strandedness would be an issue, however in practice the reference
species in a multiz MAF file is always the plus strand. To make sure the
user isn't trying to pass a MAF file containing blocks with mixed
strands to MafIndex.get_spliced(), there's a check in there to make sure
all strands for the reference species are the same. We also assume that
coordinates specified in a block are always in the ascending direction
(i.e. they are given as 'start' and 'size' and we assume the coordinates
are [start, start + size]).
There could be an issue, however, if the best alignment for a particular
species swaps strands between alignment blocks and/or exons of a
transcript. However, it might be safe to say that the user is interested
in the best alignment however it occurs, and not necessarily strand
consistency.
WRT MultipleSeqAlignment objects produced by get_spliced(), all
annotation properties are lost upon slicing, so it is up to the user to
keep track of what's what. I do remember we had talked about a way to
maintain these annotations, even after slicing. Any thoughts?
Thanks,
Andrew
More information about the Biopython-dev
mailing list