[Biopython] [Samtools-help] Segmentation fault
Mic
mictadlo at gmail.com
Tue Oct 18 10:26:06 UTC 2011
Hello,
Thank you for your email. I updated the code and find out that
print reads['chr1'] #works fine
but
print reads['chr1'][0] #caused Segmentation fault
Please find below the updated code:
from Bio import SeqIO
import pysam
from optparse import OptionParser
import subprocess, os, sys
from multiprocessing import Pool
import functools
def GetReferenceInfo(referenceFastaPath):
referencenames = []
referencelengths = []
referenceFastaFile = open(referenceFastaPath)
for record in SeqIO.parse(referenceFastaFile, "fasta"):
referencenames.append(record.name)
referencelengths.append(len(record.seq))
referenceFastaFile.close()
return (referencenames, referencelengths)
def GenerateSubsetBAM(bam_filename, ref_name):
reads = []
bam_fh = pysam.Samfile(bam_filename, "rb")
for read in bam_fh.fetch(ref_name):
reads.append(read)
print ref_name + ' Done ' + str(len(reads))
return (ref_name, reads)
if __name__ == '__main__':
parser = OptionParser("Usage: %prog -b BAMfile -f new_reference_fasta -o
outputBAM")
parser.add_option("-b", "--BAM", type="string", dest="inputBAMFilepath",
help="Specify a BAM file")
parser.add_option("-f", "--fasta", type="string", dest="fastaFilepath",
help="Specify a reference fasta file.")
parser.add_option("-o", "--output", type="string",
dest="outputBAMFilepath", help="Specify an output BAM file.")
(opts, args) = parser.parse_args()
if (opts.inputBAMFilepath is None):
print ("\nSpecify a BAM file. eg. -b large.bam\n")
parser.print_help()
elif not(os.path.exists(opts.inputBAMFilepath)):
print ("\nReference BAM file does not exists: " + opts.inputBAMFilepath
+"\n")
elif (opts.fastaFilepath is None):
print ("\nSpecify a reference fasta file. eg. -f Subset.fasta\n")
parser.print_help()
elif not(os.path.exists(opts.fastaFilepath)):
print ("\nReference fasta file does not exists: " + opts.fastaFilepath
+"\n")
elif os.path.exists(opts.outputBAMFilepath) and
not(raw_input(opts.outputBAMFilepath + " exists. Overwrite? (Y/n): ")=='Y'):
print ("\nOutput BAM exists. Please specify alternative output file.
eg. -o Subset.bam\n")
else:
print "Read fasta ..."
(ref_names, ref_lengths) = GetReferenceInfo(opts.fastaFilepath)
print 'Done!'
print "creating subset...."
pool = Pool()
GenerateSubsetBAM_wrapper = functools.partial(GenerateSubsetBAM,
opts.inputBAMFilepath)
reads = dict(pool.imap_unordered(GenerateSubsetBAM_wrapper, ref_names))
pool.close()
print "Done!"
print reads['chr1'] #works fine
print "xxxxx"
print reads['chr1'][0] #caused Segmentation fault
I run the code with the pysam-0.5 examples (pysam-0.5/tests) in the
following way:
python SubsetBAM-P.py --BAM ex1.bam -f ex1.fa -o new.bam
Read fasta ...
Done!
creating subset....
chr1 Done 1464
chr2 Done 1806
Done!
[<csamtools.AlignedRead object at 0x2b975635d168>, ...,
<csamtools.AlignedRead object at 0x2b35d89b6ca8>]
xxxxx
Segmentation fault
Thank you in advance.
On Tue, Oct 18, 2011 at 7:00 PM, Peter Cock <p.j.a.cock at googlemail.com>wrote:
> On Tue, Oct 18, 2011 at 4:44 AM, Mic <mictadlo at gmail.com> wrote:
> > Hello,
> > I have tried to generate a subset BAM, but I get a 'Segmentation fault'
> with
> > the following code:
> > from Bio import SeqIO
> > import pysam
> > from optparse import OptionParser
> > import subprocess, os, sys
> > from multiprocessing import Pool
> > import functools
> > ...
>
> I tried this and it seemed to get stuck much earlier. Could you
> cut down the example a bit by removing the multiprocessing?
>
> Peter
>
> P.S. Also you can remove the unused "import argparse" line.
>
More information about the Biopython
mailing list