[Biopython] multiprocessing problem with pysam
    Brad Chapman 
    chapmanb at 50mail.com
       
    Sun May 15 15:53:46 UTC 2011
    
    
  
Michal;
[multiprocessing]
> class Test():
>     def __init__(self, bam_filename, cultivars):
>         self.__bam_fh = pysam.Samfile(bam_filename, "rb")
>         self.__cultivars = cultivars
> 
>     def run(self, ref_name):
>         print os.getpid(), ref_name, self.__cultivars
>         return (os.getpid(), ref_name)
[...]
>     pool = Pool()
>     results = dict(pool.imap_unordered(
>         Test(bam_filename, cultivars).run, ref_names))
[...]
> and got the follwing error:
> 
> Exception in thread Thread-2:
[...]
> PicklingError: Can't pickle <type 'instancemethod'>: attribute
> lookup __builtin__.instancemethod failed
multiprocessing is sensitive to passing or calling complex class
objects. My suggestion is to use functions without associated state
attributes and pass in your information as standard python objects
(strings, lists, dicts). I use a little decorator to make writing
the functions passed easier:
import functools
def map_wrap(f):
    @functools.wraps(f)
    def wrapper(*args, **kwargs):
        return apply(f, *args, **kwargs)
    return wrapper
Then would write your function as:
@map_wrap
def run_test(bam_filename, cultivars, ref_name):
    bam_fh = pysam.Samfile(bam_filename, "rb")
    print os.getpid(), ref_name, cultivars
    return (os.getpid(), ref_name)
and call it with:
cultivars = 'Ja,Ea,As'.replace(' ', '').split(',')
bam_filename = "/media/usb/tests/test.bam"
bamfile = pysam.Samfile(bam_filename, "rb")
ref_names = bamfile.references
bamfile.close()
pool = Pool()
results = dict(pool.imap(run_test, ((bam_filename, cultivars, ref)
                                    for ref in ref_names)))
pool.close()
Hope this helps,
Brad
    
    
More information about the Biopython
mailing list