[Biopython-dev] Alignment object
Aaron Quinlan
aaronquinlan at gmail.com
Thu Mar 4 08:33:40 EST 2010
Just an FYI for those interested in developing tools to work with BAM: it may also be worth looking into the BamTools C++ API developed by Derek Barnett at Boston College (http://sourceforge.net/projects/bamtools/). The API is quite nice and has much of the necessary functionality for iterators, getters/setters, etc.
I added BAM support for my BEDTools package (http://code.google.com/p/bedtools/) using the BAMTools libraries. Save for a few minor bugs along the way, it was rather straightforward to include.
Aaron
Aaron Quinlan, Ph.D.
NRSA Postdoctoral Fellow
Hall Laboratory
University of Virginia
Biochem. & Mol. Genetics
aaronquinlan at gmail.com
On Mar 4, 2010, at 8:13 AM, Brad Chapman wrote:
> Kevin and Peter;
>
>> I was aware of pysam but am concerned about the dependencies:
>> pyrex 0.9.8 or later, python 2.6 or later, plus of course SAMtools
>> itself - which may all be fine on Linux, but will likely be trouble for
>> us on other platforms (especially Windows).
>
> I believe you can remove the pyrex requirement by shipping the
> generated C file with the distribution. Samtools itself may be an
> issue; however, right now it is probably a practical need for dealing
> with SAM/BAM since it implements a lot of BAM generation, sorting,
> merging and indexing you need in workflows. Also, the C code is
> included with the distribution so it is more a matter of getting it
> compiled than introducing extra dependencies. The bioconductor work
> appears to do the same thing.
>
>>> I agree that we should work towards supporting SAM (and perhaps
>>> also BAM) in Biopython, and other projects APIs can be very
>>> useful for inspiration or guidance.
>
> All of my work converts SAM directly into sorted and indexed BAM,
> and then build from that. For me, direct SAM parsing wouldn't be as
> useful as BAM.
>
>> Honestly, the SAM/BAM format specification is pretty dodgy. Thankfully
>> between samtools and Picard source code, I've been able to work out most of
>> the tricky bits. I'm glad to know that the R folks are also working on
>> this, since they're usually very good about generating clear documentation.
>
> Agreed, but at least we are converging on something instead of
> having to write a parser every time you use a new aligner. The
> bioconductor SVN is here:
>
> https://hedgehog.fhcrc.org/bioconductor/trunk/madman/Rpacks/Rsamtools/
> (user: readonly, pass: readonly)
>
> I think the pysam API does a decent job for reading and exposing
> this. The higher level things that would be nice to add are:
>
> - Converting the CIGAR string into something more useful.
> - Smartly dealing with the X? fields from various aligners. These
> often contain very useful information missing from the SAM
> specification. Where the data actually is will be aligner
> specific.
> - More generally easing dealing with the optional fields.
>
>> Parsing SAM is pretty simple and I can certainly help with gluing it into
>> Biopython (with some help on the Biopython side, since I'm still a newb).
>> I'm about half-way to having a BAM reader and writer for my own purposes.
>> I'm coding the time-critical parts in Cython with a fallback to pure
>> Python, so it may not be ideal for use in Biopython.
>
> Cool. Does the BAM reader require samtools C code or is it
> independent of that?
>
> Brad
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
More information about the Biopython-dev
mailing list