[Biopython] SeqRecords and multiprocessing
McCoy, Connor O
cmccoy at fhcrc.org
Fri Apr 22 16:20:27 UTC 2011
Hello,
I'm having trouble handing SeqRecords created from Roche .sff files off to subprocesses via multiprocessing pipes / queues. It looks like the issue is unpickling the Bio.SeqRecord._RestrictedDict letter_annotations on the instance. Does anyone have any suggestions for a way around this?
Currently, I'm changing the internal _per_letter_annotations to a normal dict prior to placing the object in a pipe/queue, but that doesn't seem like the best way. I also tried adding a __getnewargs__ method to the _RestrictedDict class setting the length, but didn't have success.
Here's a quick script can be run in Tests/Roche to reproduce the behavior:
#!/usr/bin/env python
import multiprocessing
from Bio import SeqIO
def print_from_pipe(pipe):
"Print the first object from the given pipe, return"
o = pipe.recv()
print o
def main():
with open('E3MFGYR02_random_10_reads.sff', 'rb') as fp:
seqrecord = next(SeqIO.parse(fp, 'sff'))
conn_recv, conn_send = multiprocessing.Pipe(False)
print '-'*79
print 'SeqRecord with _RestrictedDict'
print '-'*79
p1 = multiprocessing.Process(target=print_from_pipe, args=(conn_recv, ))
p1.start()
conn_send.send(seqrecord.letter_annotations)
conn_send.close()
p1.join()
# Without _RestrictedDict
print '-'*79
print 'Letter annotations converted to dict'
print '-'*79
# Change to standard dict
seqrecord._per_letter_annotations = seqrecord._per_letter_annotations.copy()
conn_recv, conn_send = multiprocessing.Pipe(True)
p2 = multiprocessing.Process(target=print_from_pipe, args=(conn_recv, ))
p2.start()
conn_send.send(seqrecord)
conn_send.close()
p2.join()
if __name__ == '__main__':
main()
It yields:
-------------------------------------------------------------------------------
SeqRecord with _RestrictedDict
-------------------------------------------------------------------------------
Process Process-1:
Traceback (most recent call last):
File "/mnt/orca/home/phs_grp/matsengrp/local/encap/python-2.7.1/lib/python2.7/multiprocessing/process.py", line 232, in _bootstrap
self.run()
File "/mnt/orca/home/phs_grp/matsengrp/local/encap/python-2.7.1/lib/python2.7/multiprocessing/process.py", line 88, in run
self._target(*self._args, **self._kwargs)
File "test_mp.py", line 9, in print_from_pipe
o = pipe.recv()
File "/mnt/orca/home/cpb_home/cmccoy/development/biopython/Bio/SeqRecord.py", line 33, in __setitem__
or len(value) != self._length:
AttributeError: '_RestrictedDict' object has no attribute '_length'
-------------------------------------------------------------------------------
Letter annotations converted to dict
-------------------------------------------------------------------------------
ID: E3MFGYR02JWQ7T
Name: E3MFGYR02JWQ7T
.... more normal printing ...
Thanks a lot,
Connor McCoy
More information about the Biopython
mailing list