[Biopython-dev] 1/7 newest questions tagged biopython - Stack Overflow

Feed My Inbox updates at feedmyinbox.com
Fri Jan 7 08:45:34 UTC 2011


// python multiprocessing each with own subprocess (Kubuntu,Mac)
// January 6, 2011 at 4:25 PM

http://stackoverflow.com/questions/4620041/python-multiprocessing-each-with-own-subprocess-kubuntu-mac
I've created a script that by default creates one multiprocessing Process; then it works fine. When starting multiple processes, it starts to hang, and not always in the same place. The program's about 700 lines of code, so I'll try to summarise what's going on. I want to make the most of my multi-cores, by parallelising the slowest task, which is aligning DNA sequences. For that I use the subprocess module to call a command-line program: 'hmmsearch', which I can feed in sequences through /dev/stdin, and then I read out the aligned sequences through /dev/stdout. I imagine the hang occurs because of these multiple subprocess instances reading / writing from stdout / stdin, and I really don't know the best way to go about this...
I was looking into os.fdopen(...) & os.tmpfile(), to create temporary filehandles or pipes where I can flush the data through. However, I've never used either before & I can't picture how to do that with the subprocess module. Ideally I'd like to bypass using the hard-drive entirely, because pipes are much better with high-throughput data processing!
Any help with this would be super wonderful!!

import multiprocessing, subprocess
from Bio import SeqIO

class align_seq( multiprocessing.Process ):
    def __init__( self, inPipe, outPipe, semaphore, options ):
        multiprocessing.Process.__init__(self)
        self.in_pipe = inPipe          ## Sequences in
        self.out_pipe = outPipe        ## Alignment out
        self.options = options.copy()  ## Modifiable sub-environment
        self.sem = semaphore

    def run(self):
        inp = self.in_pipe.recv()
        while inp != 'STOP':
            seq_record , HMM = inp  # seq_record is only ever one Bio.Seq.SeqRecord object at a time.
                                    # HMM is a file location.
            align_process = subprocess.Popen( ['hmmsearch', '-A', '/dev/stdout', '-o',os.devnull, HMM, '/dev/stdin'], shell=False, stdin=subprocess.PIPE, stdout=subprocess.PIPE )
            self.sem.acquire()
            align_process.stdin.write( seq_record.format('fasta') )
            align_process.stdin.close()
            for seq in SeqIO.parse( align_process.stdout, 'stockholm' ):  # get the alignment output
                self.out_pipe.send_bytes( seq.seq.tostring() ) # send it to consumer
            align_process.wait()   # Don't know if there's any need for this??
            self.sem.release()
            align_process.stdout.close()
            inp = self.in_pipe.recv()  
        self.in_pipe.close()    #Close handles so don't overshoot max. limit on number of file-handles.
        self.out_pipe.close()


--
Website: http://stackoverflow.com/questions/tagged/?tagnames=biopython&sort=newest

Account Login: 
https://www.feedmyinbox.com/members/login/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

Unsubscribe here: 
http://www.feedmyinbox.com/feeds/unsubscribe/520218/8a8361b28bdb22206ffa317797e7067a6f101db5/?utm_source=fmi&utm_medium=email&utm_campaign=feed-email

--
This email was carefully delivered by FeedMyInbox.com. 
PO Box 682532 Franklin, TN 37068




More information about the Biopython-dev mailing list