[Biopython-dev] [Bug 2905] Short read alignment format SAM / BAM

Vince S. Buffalo vsbuffalo at gmail.com
Thu Jul 15 17:20:26 UTC 2010


Sorry to bump this old topic, but are there plans to merge this into the
main project? I do a lot of processing with the SAM format and it would be
great to use Biopython for this.

Does the pure Python implementation run as quickly as the pysam version? Is
anyone still considering forking pysam and rewriting the C wrappers?

Vince

On Fri, May 14, 2010 at 10:05 AM, <bugzilla-daemon at portal.open-bio.org>wrote:

> http://bugzilla.open-bio.org/show_bug.cgi?id=2905
>
>
>
>
>
> ------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk 2010-05-14 13:05 EST -------
> The code on my branch has been updated, and now supports SAM and BAM
> parsing
> (currently it only extracts the read name, sequence and quality scores),
> indexing by name with Bio.SeqIO.index(), and fast conversion to FASTA or
> Sanger FASTQ with Bio.SeqIO.convert() which is handy for redoing a mapping:
>
> http://github.com/peterjc/biopython/tree/seqio-sam-bam
>
> Note that suffixes of "/1" or "/2" are added to forward or reverse read
> names to make them unique. This matches the Illumina pipeline convention
> and is handled by most tools which take paired end data.
>
> I'm actually using this code at the moment: I've started with BAM files of
> paired end Illumina transcriptome reads mapped onto a draft assembly. I
> then
> used the convert code to convert these to FASTQ files, then split them into
> a pair of FASTQ files (forward and reverse) and used BWA to remap them to a
> different reference (giving new SAM files).
>
>
> --
> Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
> ------- You are receiving this mail because: -------
> You are the assignee for the bug, or are watching the assignee.
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython-dev
>



-- 
Vince Buffalo
Programmer
Bioinformatics Core
UC Davis Genome Center
University of California, Davis

"There's real poetry in the real world. Science is the poetry of reality."
-Richard Dawkins



More information about the Biopython-dev mailing list