[Biopython] SeqRecords and multiprocessing

McCoy, Connor O cmccoy at fhcrc.org
Fri Apr 22 16:20:27 UTC 2011


Hello,

I'm having trouble handing SeqRecords created from Roche .sff files off to subprocesses via multiprocessing pipes / queues.  It looks like the issue is unpickling the Bio.SeqRecord._RestrictedDict letter_annotations on the instance. Does anyone have any suggestions for a way around this?  

Currently, I'm changing the internal _per_letter_annotations to a normal dict prior to placing the object in a pipe/queue, but that doesn't seem like the best way.  I also tried adding a __getnewargs__ method to the _RestrictedDict class setting the length, but didn't have success.

Here's a quick script can be run in Tests/Roche to reproduce the behavior:

    #!/usr/bin/env python

    import multiprocessing

    from Bio import SeqIO

    def print_from_pipe(pipe):
        "Print the first object from the given pipe, return"
        o = pipe.recv()
        print o

    def main():
        with open('E3MFGYR02_random_10_reads.sff', 'rb') as fp:
            seqrecord = next(SeqIO.parse(fp, 'sff'))

        conn_recv, conn_send = multiprocessing.Pipe(False)

        print '-'*79
        print 'SeqRecord with _RestrictedDict'
        print '-'*79
        p1 = multiprocessing.Process(target=print_from_pipe, args=(conn_recv, ))
        p1.start()
        conn_send.send(seqrecord.letter_annotations)
        conn_send.close()
        p1.join()

        # Without _RestrictedDict
        print '-'*79
        print 'Letter annotations converted to dict'
        print '-'*79
        # Change to standard dict
        seqrecord._per_letter_annotations = seqrecord._per_letter_annotations.copy()
        conn_recv, conn_send = multiprocessing.Pipe(True)
        p2 = multiprocessing.Process(target=print_from_pipe, args=(conn_recv, ))
        p2.start()
        conn_send.send(seqrecord)
        conn_send.close()
        p2.join()

    if __name__ == '__main__':
        main()

It yields:

    -------------------------------------------------------------------------------
    SeqRecord with _RestrictedDict
    -------------------------------------------------------------------------------
    Process Process-1:
    Traceback (most recent call last):
      File "/mnt/orca/home/phs_grp/matsengrp/local/encap/python-2.7.1/lib/python2.7/multiprocessing/process.py", line 232, in _bootstrap
        self.run()
      File "/mnt/orca/home/phs_grp/matsengrp/local/encap/python-2.7.1/lib/python2.7/multiprocessing/process.py", line 88, in run
        self._target(*self._args, **self._kwargs)
      File "test_mp.py", line 9, in print_from_pipe
        o = pipe.recv()
      File "/mnt/orca/home/cpb_home/cmccoy/development/biopython/Bio/SeqRecord.py", line 33, in __setitem__
        or len(value) != self._length:
    AttributeError: '_RestrictedDict' object has no attribute '_length'
    -------------------------------------------------------------------------------
    Letter annotations converted to dict
    -------------------------------------------------------------------------------
    ID: E3MFGYR02JWQ7T
    Name: E3MFGYR02JWQ7T
    .... more normal printing ...

Thanks a lot,
Connor McCoy



More information about the Biopython mailing list