[Biopython-dev] MAF Parser/Writer/Indexer
Sczesnak, Andrew
Andrew.Sczesnak at med.nyu.edu
Sun May 15 19:59:02 UTC 2011
> With something like SAM/BAM (or other assembly formats like ACE or the
> MIRA alignment format also called MAF), you can have multiple
> alignments (the contigs or chromosomes) each with many entries
> (supporting reads). Here there is a clear single reference coordinate
> system, that of the (gapped) reference contigs/chromosomes. This also
> means each alignment has a clear name (the name of the reference
> contig/chromosome), so this name and coordinates can be used for
> indexing (as in samtools).
>
> With MAF however, things are not so easy - any of the sequences could
> be used as a reference (e.g. human chr 1, or mouse chr 2), and any
> region of a sequence might be in more than one alignment.
>
> I'm beginning to suspect what Andrew has in mind is going to be MAF
> specific - so it won't be top level functionality in Bio.AlignIO, but
> rather tucked away in Bio.AlignIO.MafIO instead.
>
> Peter
I agree, the fact that this particular format does not explicitly define the
reference sequence is problematic. Based on the spec, we ought to be
prepared for a multiz MAF file with several different reference sequences.
However, practically speaking, the files out there in the world _do_ have a
reference sequence, which appears in all alignments and is the first listed
sequence. While I think there is definitely some trickyness to how this
parser will have to interact with any API, my feeling is that these portions
ought to be confined to MafIO, while a more general API lives in AlignIO or
elsewhere. This isn't much different from a format like SFF, I think.
Andrew
------------------------------------------------------------
This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email.
=================================
More information about the Biopython-dev
mailing list