[Biopython] multiprocessing problem with pysam
Brad Chapman
chapmanb at 50mail.com
Sun May 15 11:53:46 EDT 2011
Michal;
[multiprocessing]
> class Test():
> def __init__(self, bam_filename, cultivars):
> self.__bam_fh = pysam.Samfile(bam_filename, "rb")
> self.__cultivars = cultivars
>
> def run(self, ref_name):
> print os.getpid(), ref_name, self.__cultivars
> return (os.getpid(), ref_name)
[...]
> pool = Pool()
> results = dict(pool.imap_unordered(
> Test(bam_filename, cultivars).run, ref_names))
[...]
> and got the follwing error:
>
> Exception in thread Thread-2:
[...]
> PicklingError: Can't pickle <type 'instancemethod'>: attribute
> lookup __builtin__.instancemethod failed
multiprocessing is sensitive to passing or calling complex class
objects. My suggestion is to use functions without associated state
attributes and pass in your information as standard python objects
(strings, lists, dicts). I use a little decorator to make writing
the functions passed easier:
import functools
def map_wrap(f):
@functools.wraps(f)
def wrapper(*args, **kwargs):
return apply(f, *args, **kwargs)
return wrapper
Then would write your function as:
@map_wrap
def run_test(bam_filename, cultivars, ref_name):
bam_fh = pysam.Samfile(bam_filename, "rb")
print os.getpid(), ref_name, cultivars
return (os.getpid(), ref_name)
and call it with:
cultivars = 'Ja,Ea,As'.replace(' ', '').split(',')
bam_filename = "/media/usb/tests/test.bam"
bamfile = pysam.Samfile(bam_filename, "rb")
ref_names = bamfile.references
bamfile.close()
pool = Pool()
results = dict(pool.imap(run_test, ((bam_filename, cultivars, ref)
for ref in ref_names)))
pool.close()
Hope this helps,
Brad
More information about the Biopython
mailing list