[Biopython] multiprocessing problem with pysam

Brad Chapman chapmanb at 50mail.com
Sun Apr 10 07:15:10 EDT 2011


Michal;

> I have tried to rewrite the following code from
> http://wwwfgu.anat.ox.ac.uk/~andreas/documentation/samtools/api.html
[...]
> with the following multiprocessing code:
[...]
>     pool = Pool()
> 
>     samfile = pysam.Samfile("ex1.bam", "rb")
>     references = samfile.references
> 
>     for reference in samfile.references:
>         print ">", reference
>         pool.apply_async(calc_pileup, [samfile, reference, 100, 120])
[...]
> However, I got the following out:
[...]
> TypeError: _open() takes at least 1 positional argument (0 given)

You are passing the open file handle 'samfile' to your multiprocessing
function. The arguments you pass through need to be able to be pickled
by Python; normally you need to stick with more basic data structures.
Specifically, I would suggest passing in the filename and then opening a
pysam reference within the worker functions.

def calc_pileup(fname, reference_name, start_pos, end_pos):
    samfile = pysam.Samfile(fname, "rb")
    coverages = []
    print reference_name, os.getpid()

if __name__ == '__main__':
    pool = Pool()
    fname = "ex1.bam"
    samfile = pysam.Samfile(fname, "rb")
    references = samfile.references
    samfile.close()
    for reference in samfile.references:
        print ">", reference
        pool.apply_async(calc_pileup, [fname, reference, 100, 120])

My more general suggestion with multiprocessing is to start with a
simple workflow and expand. This will let you get a sense of where
your objects may be too complex to pickle and you need to simplify.

Hope this helps,
Brad




More information about the Biopython mailing list