[Biopython] multiprocessing problem with pysam

Michal mictadlo at gmail.com
Mon Apr 11 11:57:17 UTC 2011


On 04/10/2011 09:15 PM, Brad Chapman wrote:
> Michal;
>
>> I have tried to rewrite the following code from
>> http://wwwfgu.anat.ox.ac.uk/~andreas/documentation/samtools/api.html
> [...]
>> with the following multiprocessing code:
> [...]
>>      pool = Pool()
>>
>>      samfile = pysam.Samfile("ex1.bam", "rb")
>>      references = samfile.references
>>
>>      for reference in samfile.references:
>>          print ">", reference
>>          pool.apply_async(calc_pileup, [samfile, reference, 100, 120])
> [...]
>> However, I got the following out:
> [...]
>> TypeError: _open() takes at least 1 positional argument (0 given)
> You are passing the open file handle 'samfile' to your multiprocessing
> function. The arguments you pass through need to be able to be pickled
> by Python; normally you need to stick with more basic data structures.
> Specifically, I would suggest passing in the filename and then opening a
> pysam reference within the worker functions.
>
> def calc_pileup(fname, reference_name, start_pos, end_pos):
>      samfile = pysam.Samfile(fname, "rb")
>      coverages = []
>      print reference_name, os.getpid()
>
> if __name__ == '__main__':
>      pool = Pool()
>      fname = "ex1.bam"
>      samfile = pysam.Samfile(fname, "rb")
>      references = samfile.references
>      samfile.close()
>      for reference in samfile.references:
>          print ">", reference
>          pool.apply_async(calc_pileup, [fname, reference, 100, 120])
>
> My more general suggestion with multiprocessing is to start with a
> simple workflow and expand. This will let you get a sense of where
> your objects may be too complex to pickle and you need to simplify.
>
> Hope this helps,
> Brad
>
>
> _______________________________________________
> Biopython mailing list  -  Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
>
Thank you for your response. I changed the code in the following way:

--------------------------
import pysam
import os
from multiprocessing import Pool
from pprint import pprint

class Pileup_info():
     def __init__(pileup_pos, coverage):
         self.pileup_pos = pileup_pos
         self.coverage = coverage

     reads = []

class Reads_info():
     def __init__(read_name, read_base):
         self.read_name = read_name
         self.read_base = read_base

def calc_pileup(fname, reference_name, start_pos, end_pos):
     samfile = pysam.Samfile(fname, "rb")
     coverages = []
     print reference_name, os.getpid()
         for pileupcolumn in samfile.pileup(reference_name, start_pos, 
end_pos):
                 pileup_inf = Pileup_info(pileupcolumn.pos, pileupcolumn.n)
                 #print 'coverage at base %s = %s' % (pileupcolumn.pos , 
pileupcolumn.n)
                 for pileupread in pileupcolumn.pileups:
                     #print '\tbase in read %s = %s' % 
(pileupread.alignment.qname, pileupread.alignment.seq[pileupread.qpos])
             
pileup_inf.reads.append(Reads_info(pileupread.alignment.qname, 
pileupread.alignment.seq[pileupread.qpos]))
         coverages.append(pileup_inf)

     samfile.close()
     return (reference_name, coverages)

def output(coverage):
     #for
     print
     print

if __name__ == '__main__':
     pool = Pool()

     fname = "ex1.bam"
     samfile = pysam.Samfile(fname, "rb")
     references = samfile.references
     samfile.close()

     results = [pool.apply_async(calc_pileup, [fname, reference, 100, 
120]) for reference in references]
         #print ">", reference
         #results = pool.apply_async(calc_pileup, [fname, reference, 
100, 120])

     pool.close()
     pool.join()
     for r in results:
         print r
         pprint(r.get())
--------------------------

and I have got this error:

--------------------------
$ python multi.py
chr1 6056
chr2 6057
<multiprocessing.pool.ApplyResult object at 0xeb7bd0>
Traceback (most recent call last):
   File "multi.py", line 54, in <module>
     pprint(r.get())
   File 
"/home/mictadlo/apps/python/lib/python2.7/multiprocessing/pool.py", line 
491, in get
     raise self._value
TypeError: __init__() takes exactly 2 arguments (3 given)
--------------------------

What did I do wrong?







More information about the Biopython mailing list