[Biopython] Problems parsing with PSIBlastParser

Miguel Ortiz Lombardia ibdeno at gmail.com
Tue Oct 13 09:57:13 EDT 2009


Le 13 oct. 09 à 15:36, Peter a écrit :

> On Tue, Oct 13, 2009 at 12:58 PM, Miguel Ortiz Lombardia
> <ibdeno at gmail.com> wrote:
>>>
>>> Hmm - the switch to using subprocess (on Python 2.4+ or later) was  
>>> made
>>> in October 2008, and would have first appeared in Biopython 1.49.  
>>> Maybe
>>> you were using Biopython 1.48 before - or the issue is something  
>>> else.
>>>
>>> Peter
>>
>>
>> It may well have been 1.48... Having a closer look at the files  
>> from my last
>> successful runs I discover the actually come from November 2008...
>>
>> I'm now running the tests you suggested.
>
> Let me know what they show. How long do these BLAST runs take?
> Perhaps I was ambitious with the number of suggestions to try ;)

It took long, because I wanted to reproduce the same situation.
All the three suggestions you made worked!
I have at least a work-around now.

>
> Assuming the problem is with how we are calling the BLAST tool via the
> subprocess module, I have two suggested fixes in mind. The first is  
> a change
> to the _invoke_blast() function in Bio/Blast/NCBIStandalone.py,  
> essentially
> replace these lines:
>
>    blast_process.stdin.close()
>    return blast_process.stdout, blast_process.stderr
>
> With this:
>
>    stdout, stderr = blast_process.communicate()
>    from StringIO import StringIO
>    return StringIO(stdout), StringIO(stderr)
>
> We had to make a similar change to Bio.Clustalw for Bug 2804. This  
> uses
> subprocess to buffer the data in order to avoid any deadlock reading  
> from
> the handles. I hadn't made this change before as it imposes a memory
> overhead (and BLAST output is often *very* large, especially as XML),
> and until now there hadn't been any problems reported. It would be  
> worth
> trying in your situation (even just to confirm the source of the  
> error), but
> I don't think we should make this change for the official  
> distribution.
>

You're right, probably not justified if I'm the only one with this  
problem.

> The second option (which I mentioned before) is to tell blastpgp to  
> write
> its output directly to a file, and then parse the file. This is how  
> I normally
> run large BLAST jobs. This is possible but not elegant via the  
> function
> Bio.Blast.NCBIStandalone.blastpgp (which always returns stdout/stderr
> handles). Bug 2654 has an example,
> http://bugzilla.open-bio.org/show_bug.cgi?id=2654
>
> However, what I want to recommend instead is to use the more flexible
> Bio.Blast.Applications objects instead (in this case, the class
> BlastpgpCommandline). I had planed to update the BLAST chapter
> of the Biopython Tutorial to cover this, but it didn't happen in  
> time for
> the Biopython 1.52 release. However, the alignment chapter goes
> through several examples of this style of command line tool wrapper,
> and the BLAST wrappers work in exactly the same way.
>
> Using these "lower level" application wrappers, it is up to you to  
> invoke
> subprocess (or another system call) as you see fit (e.g. with pipes).
> This is more flexible than the old Bio.Blast.NCBIStandalone.blastpgp
> function (and others like it) where the behaviour could not be set.

I will explore this possibility, it seems definitely more elegant than  
the other one (as in Bug 2654).

>
> Feel free to ask for clarification on this - questions now will help  
> for
> rewriting the BLAST chapter later on ;)

I may come back with questions :-)

Thank you very much for your help!

Best,


-- Miguel






More information about the Biopython mailing list