[Biopython] retrieving paired sequences from BAM

Mic mictadlo at gmail.com
Mon Feb 20 01:37:59 EST 2012


Hello,
I am trying to retrieve paired end sequences from BAM file. However, I am
only able to retrieve the A sequence with the following code:
''''''''''''''''''''''''''''''''''''''''
import pysam
samfile = pysam.Samfile("b.sorted.bam", "rb")
for read in samfile.fetch():
        if read.is_paired:
                print read.qname
                print read.seq
samfile.close()

$ python a.py | head -n 2
HWI-ST226_0154:4:1206:12773:170407#CTTGTA
AGTATAAAACTAAGCAAACTGTTAGAACTTTGATTACGTTTTGTTTATCAGTGATACGCAAAAGTTTAAGATCCTTGAGTACCTCTTTCGATGGCGGATT
''''''''''''''''''''''''''''''''''''''''

How is it possible to retrieve also the B sequence?
What is the difference between is_proper_pair and is_paired?

-------------
$ grep -A 1 'HWI-ST226_0154:4:1206:12773:170407#CTTGTA' X_A.fastq
@HWI-ST226_0154:4:1206:12773:170407#CTTGTA/1
AGTATAAAACTAAGCAAACTGTTAGAACTTTGATTACGTTTTGTTTATCAGTGATACGCAAAAGTTTAAGATCCTTGAGTACCTCTTTCGATGGCGGATT

$ grep -A 1 'HWI-ST226_0154:4:1206:12773:170407#CTTGTA' X_B.fastq
@HWI-ST226_0154:4:1206:12773:170407#CTTGTA/2
GGCGCTGTGACTGAAGTCCTCAGATTTCGGTACGGTTTTGTCTATTTCTGGGTTCCTGCGGAAACACCTTCTCGATTATTTTCTAATCTCAATTAGGTTT
-------------

Thank you in advance.

Cheers


More information about the Biopython mailing list