[BioPython] BLAST in a python generator

Kael Fischer kael at sonic.net
Tue Oct 26 18:56:08 EDT 2004


Regarding threads and Biopython.

I've been experimenting with keeping BLAST running in a separate thread 
using a python generator.  Then calling the .next() method of the generator 
when I want to run the next query.  The query is held in a stringIO buffer 
that the generator can read.  

The Idea is that the overhead of reading the database needn't be repeated 
as that is part of the generator's state.  

This is the first time I've written a generator.  Unfortunately I don't seem to 
be able to get _all_ of the output of the BLAST record.  Most of the time 
my select loops return only part of the result. The code below is one of 
several schemes I've tried.  

For those interested in the idea, here is some of the code:
(no Biopython in this snippet)

def BLASTpipe(inBuf, blastDB = genomeDB):
    """Generator for a BLAST process.
        inBuf is a StringIO buffer that contains one or
        more query sequences.
        .next() processes the query(s) in inBuf.  inBuf is consumed
        and a tuple of the output and error strings is returned.
    """

    # Format DB, if necessary
    if not os.access(blastDB + '.nhr' ,os.R_OK) \
           or not os.access(blastDB + '.nin' ,os.R_OK) \
           or not os.access(blastDB + '.nsq' ,os.R_OK):
        # db is not formatted
        tmpDbFile = NamedTemporaryFile()
        userDbFile = file(blastDB,'r')
        tmpDbFile.write(userDbFile.read())
        userDbFile.close()
        tmpDbFile.flush()
        blastDB = tmpDbFile.name
        # format db
        os.system('%s -pF -l /dev/null -i%s' % (formatdb_exe, blastDB))

    blast_in, blast_out, blast_err = os.popen3(blast_exe + \
                                         ' -p blastn -d %s '  % (blastDB), 't',1)
    while True:
        outString = ''
        errString = ''

        
        inBuf.seek(0)
        inQuery = inBuf.read()

        blast_in.write(inQuery)
        
        inBuf.seek(0)
        inBuf.truncate()

        readyReaders, undef, undef = select([blast_out,blast_err],[],[],0.5)
        while readyReaders != []:
            if blast_out in readyReaders:
                outString = blast_out.read(1)
                while blast_out in select([blast_out],[],[],0.5) [0]:
                    outString += blast_out.read(1)
                
            if blast_err in readyReaders:
                errString = blast_err.read(1)
                while blast_err in select([blast_err],[],[],0.5) [0]:
                    errString += blast_err.read(1)

            readyReaders, undef, undef = select([blast_out],[],[],0.5)

        yield outString, errString

# end

Comments?

Kael

--
Kael Fischer, Ph.D.
DeRisi Lab, University of California San Francisco
Desk: 415-514-4320
kael at derisilab.ucsf.edu



More information about the BioPython mailing list