[Biopython] multiprocessing problem with pysam
Michal
mictadlo at gmail.com
Mon Jun 6 21:59:39 UTC 2011
On 05/16/2011 01:53 AM, Brad Chapman wrote:
> Michal;
>
> [multiprocessing]
> multiprocessing is sensitive to passing or calling complex class
> objects. My suggestion is to use functions without associated state
> attributes and pass in your information as standard python objects
> (strings, lists, dicts). I use a little decorator to make writing
> the functions passed easier:
>
> import functools
> def map_wrap(f):
> @functools.wraps(f)
> def wrapper(*args, **kwargs):
> return apply(f, *args, **kwargs)
> return wrapper
>
> Then would write your function as:
>
> @map_wrap
> def run_test(bam_filename, cultivars, ref_name):
> bam_fh = pysam.Samfile(bam_filename, "rb")
> print os.getpid(), ref_name, cultivars
> return (os.getpid(), ref_name)
>
> and call it with:
>
> cultivars = 'Ja,Ea,As'.replace(' ', '').split(',')
> bam_filename = "/media/usb/tests/test.bam"
> bamfile = pysam.Samfile(bam_filename, "rb")
> ref_names = bamfile.references
> bamfile.close()
>
> pool = Pool()
> results = dict(pool.imap(run_test, ((bam_filename, cultivars, ref)
> for ref in ref_names)))
> pool.close()
>
> Hope this helps,
> Brad
Thank you Brad it works and I also found the following solution:
import os
from multiprocessing import Pool
from pprint import pprint
import functools
def calc_p(fname, start_pos, end_pos, reference_name):
print os.getpid()
print "fname", fname
print "reference_name", reference_name
print "start_pos", start_pos
print "end_pos", end_pos
print
return (reference_name, [os.getpid(), 'x1', 'x2'])
if __name__ == '__main__':
pool = Pool()
fname = "ex1.txt"
references = ['Test1', 'Test2', 'Test3', 'Test4']
run_test = functools.partial(calc_p, fname, 100, 120)
result = dict(pool.imap_unordered(run_test, references))
pprint(result)
Michal
More information about the Biopython
mailing list