[Biopython] multiprocessing problem with pysam

Brad Chapman chapmanb at 50mail.com
Sun May 15 15:53:46 UTC 2011


Michal;

[multiprocessing]
> class Test():
>     def __init__(self, bam_filename, cultivars):
>         self.__bam_fh = pysam.Samfile(bam_filename, "rb")
>         self.__cultivars = cultivars
> 
>     def run(self, ref_name):
>         print os.getpid(), ref_name, self.__cultivars
>         return (os.getpid(), ref_name)
[...]
>     pool = Pool()
>     results = dict(pool.imap_unordered(
>         Test(bam_filename, cultivars).run, ref_names))
[...]
> and got the follwing error:
> 
> Exception in thread Thread-2:
[...]
> PicklingError: Can't pickle <type 'instancemethod'>: attribute
> lookup __builtin__.instancemethod failed

multiprocessing is sensitive to passing or calling complex class
objects. My suggestion is to use functions without associated state
attributes and pass in your information as standard python objects
(strings, lists, dicts). I use a little decorator to make writing
the functions passed easier:

import functools
def map_wrap(f):
    @functools.wraps(f)
    def wrapper(*args, **kwargs):
        return apply(f, *args, **kwargs)
    return wrapper

Then would write your function as:

@map_wrap
def run_test(bam_filename, cultivars, ref_name):
    bam_fh = pysam.Samfile(bam_filename, "rb")
    print os.getpid(), ref_name, cultivars
    return (os.getpid(), ref_name)

and call it with:

cultivars = 'Ja,Ea,As'.replace(' ', '').split(',')
bam_filename = "/media/usb/tests/test.bam"
bamfile = pysam.Samfile(bam_filename, "rb")
ref_names = bamfile.references
bamfile.close()

pool = Pool()
results = dict(pool.imap(run_test, ((bam_filename, cultivars, ref)
                                    for ref in ref_names)))
pool.close()

Hope this helps,
Brad



More information about the Biopython mailing list