[Biopython-dev] MAF Parser/Writer/Indexer

Sczesnak, Andrew Andrew.Sczesnak at med.nyu.edu
Sun May 15 20:14:42 UTC 2011


Hi Brad,

> We may want to take a look at the interval access functionality in
> bx-python and MAF parsing tied in with this:
>
> https://bitbucket.org/james_taylor/bx-python/src/tip/lib/bx/interval_index_file.py
> https://bitbucket.org/james_taylor/bx-python/src/tip/lib/bx/align/maf.py

The interval indexing scheme in bx-python seems really nice.  By dropping
intervals into bins, a la UCSC MySQL tables, and using a compact file format
instead of SQLite, I'm sure it's quite fast.

> It would be useful to have an API that queries across bx-python intervals,
> BAM intervals and other formats.

I agree, I think it would be great if we could implement some sort of API for
indexing and accessing intervals in SAM/BAM, MAF, ACE, and really, any
format that can be made to report an offset and set of interval coordinates.
Even a multifasta can have interval information in the header that a user
could extract and pass to the indexer with a callback function.  Gene annotation
files, like GFF, have this information too.

What would make the most sense here?  Would a more general interval indexing
and searching module be too much?  I feel like a task I'm always performing is
searching various files by chromosome, start, and stop.

Example: A BED file of ChIP-Seq peaks called by MACS--are there any peaks
overlapping gene X?

Example: How many alignments are there in an RNA-Seq BAM file that overlap
rRNA and tRNA annotations in a GFF file, presumably from contaminating RNA?


Andrew

------------------------------------------------------------
This email message, including any attachments, is for the sole use of the intended recipient(s) and may contain information that is proprietary, confidential, and exempt from disclosure under applicable law. Any unauthorized review, use, disclosure, or distribution is prohibited. If you have received this email in error please notify the sender by return email and delete the original message. Please note, the recipient should check this email and any attachments for the presence of viruses. The organization accepts no liability for any damage caused by any virus transmitted by this email.
=================================





More information about the Biopython-dev mailing list