[Biopython-dev] [Bug 2905] Short read alignment format SAM / BAM

Vince S. Buffalo vsbuffalo at gmail.com
Thu Jul 15 20:05:12 UTC 2010


Our group has used the SAM format in parsing CIGAR strings to find hybrid
mapped reads for various projects. We primarily use the pileup format in
looking for SNP candidates and in differential expression analysis with
RNA-seq. cDNA reads are mapped back to a reference transcriptome, and then
we parse the pileup format to form counts for transcripts, which then go to
R for differential expression analysis. As we look towards pipelining some
common tasks, it would be nice if pysam's functionality were in Biopython.
Also, I wonder if other folks work with the pileup format as frequently as
we do - if so, this may be a worthy candidate for a parser.


I'll look into BioLib and EMBOSS, thanks Peter.

Vince

On Thu, Jul 15, 2010 at 11:35 AM, Peter <biopython at maubp.freeserve.co.uk>wrote:

> On Thu, Jul 15, 2010 at 6:20 PM, Vince S. Buffalo wrote:
> > Sorry to bump this old topic, but are there plans to merge this into the
> > main project? I do a lot of processing with the SAM format and it would
> be
> > great to use Biopython for this.
> >
> > Does the pure Python implementation run as quickly as the pysam
> > version? Is anyone still considering forking pysam and rewriting the
> > C wrappers?
> >
> > Vince
>
> EMBOSS now has limited SAM/BAM support,
> http://lists.open-bio.org/pipermail/emboss-dev/2010-July/000656.html
>
> BioLib is also now taking an interest in SAM/BAM support,
> I'd expect to see something on their mailing list soon:
> http://biolib.open-bio.org/wiki/Main_Page
>
> Can I ask what you want to do with SAM/BAM files?
>
> I did quite a bit of exploratory work for SAM/BAM in SeqIO,
> focussing on the raw reads (not the alignment side). This
> is very different from what you can do with PySam. It has
> allowed me to do SAM/BAM back to FASTQ which has been
> helpful in real work. There are branches on github, but still
> quite experimental and not necessarily going to be committed:
> http://github.com/peterjc/biopython/tree/seqio-sam-bam
> http://github.com/peterjc/biopython/tree/seqio-sam-bam-index
>
> Peter
>



-- 
Vince Buffalo
Programmer
Bioinformatics Core
UC Davis Genome Center
University of California, Davis

"There's real poetry in the real world. Science is the poetry of reality."
-Richard Dawkins



More information about the Biopython-dev mailing list