[Biopython-dev] [Bug 2905] Short read alignment format SAM / BAM

bugzilla-daemon at portal.open-bio.org bugzilla-daemon at portal.open-bio.org
Fri May 14 13:05:06 EDT 2010


http://bugzilla.open-bio.org/show_bug.cgi?id=2905





------- Comment #7 from biopython-bugzilla at maubp.freeserve.co.uk  2010-05-14 13:05 EST -------
The code on my branch has been updated, and now supports SAM and BAM parsing
(currently it only extracts the read name, sequence and quality scores),
indexing by name with Bio.SeqIO.index(), and fast conversion to FASTA or
Sanger FASTQ with Bio.SeqIO.convert() which is handy for redoing a mapping:

http://github.com/peterjc/biopython/tree/seqio-sam-bam

Note that suffixes of "/1" or "/2" are added to forward or reverse read
names to make them unique. This matches the Illumina pipeline convention
and is handled by most tools which take paired end data.

I'm actually using this code at the moment: I've started with BAM files of
paired end Illumina transcriptome reads mapped onto a draft assembly. I then
used the convert code to convert these to FASTQ files, then split them into
a pair of FASTQ files (forward and reverse) and used BWA to remap them to a
different reference (giving new SAM files).


-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.


More information about the Biopython-dev mailing list