[Biopython] Problems parsing with PSIBlastParser

Miguel Ortiz Lombardia ibdeno at gmail.com
Wed Oct 14 20:15:07 UTC 2009


Le 14 oct. 09 à 17:02, Andrea a écrit :
> Peter ha scritto:
>> On Wed, Oct 14, 2009 at 3:28 PM, Andrea <andrea at biodec.com> wrote:
>>
>>> Hi to everybody,
>>> I work with blast quite often and i could say i run hundreds of  
>>> thousand
>>> of blastpgp. The "Query 0" outpt of blastpgp, is quite common for  
>>> me, and
>>> i wrote a patch to my code, to remove these "nasty" lines, before  
>>> passing
>>> the output to the parser.
>>>
>>> I found these type of lines in at least 1-2% of my runs. And i'm  
>>> fully sure
>>> that i found them either in the output of blast via shell and in  
>>> the output
>>> of blast via Biopython.
>>>
>>> The problem, according to me, is in the blastpgp algorithm and maybe
>>> could be managed in biopython (as i did in my code), cutting out  
>>> these
>>> "Query 0" lines, because from the point of view of the alignments,
>>> they don't have any sense. It seems that blastpgp, wants to show
>>> which is the part of the target sequence align to the query before  
>>> the
>>> starting point of the query itself (something like opening a gap,  
>>> at the
>>> beginning of the query).
>>> And this happens "sometimes", and without any apparent reason.
>>>
>>
>> Andrea - do you have any small example output files with this
>> problem? If it does occur fairly often (1 to 2% of the time), then
>> we should try and update the parser to cope. Miguel's example
>> is useful for testing while working on a bug fix, but too big to
>> include as part the unit tests.
>>
>>
> mmm... i've to search. I've some "cache" of gzipped blastpgp outputs.
> But I'm not
> sure i've the original (maybe already patched).... waht I'm sure, is
> that in the
> next month I'm going to run almost 100.000 blasptpg so I'll for sure  
> find
> something small. ;-)
>>> What i think, is that there aren't any problem with biopython in  
>>> wrapping
>>> the blastpgp process and maybe, but i'm not sure, the difference  
>>> in the
>>> output could be related to small differences in the parameter of  
>>> the process
>>> (or in the environment... or in the .ncbirc file).
>>>
>>> I always was able to  observe  the identity  between the blastpgp  
>>> output
>>> via shell (bash) and the output of the popen wrapper.
>>>
>>
>> If you saw "Query 0" output at the command line (shell), then that is
>> worth knowing.

All I can say is that this is not what I observe.
1. When I send directly from the shell exactly the same blastpgp  
search ( I capture the full command line issued in the background by  
the python script with a 'ps -a | grep blastpgp' ) I have never find  
the 'Query: 0' lines.
2. When I send the search from within the python script and use  
'nohup', the problem is reproducible, not random.
3. If the script is sent without 'nohup', that is, if the shell keeps  
full control of both standard error and output, then again, the  
problem seems to disappear. I say 'seems' because I haven't tried with  
my longest ( more than 1300 aa ) sequences.
4. When, from within the python script I use, as Peter suggested, the  
BlastpgpCommandline class to ask blastpgp to send the output to a file  
( the -o option ) the problem disappears irrespectively whether I use  
or not 'nohup'.

Therefore, in my opinion, the problem is not with blastpgp but with  
the handling of its output by python or biopython.

>>
> i think so.
>>> Miguel, could you check if really everything is identical? Because  
>>> i'm
>>> really surprised of such a strange behaviour....
>>
>> Maybe the environment variables are different or something?

Command options are absolutely the same, see above. I am surprised  
too, but I don't think blastpgp is sensitive to any environment  
variable and I don't see how they could change from an in-script to a  
standalone run.

>>
>>> Despite, according to me there aren't any problem in biopython,  
>>> and maybe,
>>> Miguel will be able to discover some differences in the way  
>>> blastpgp is
>>> launched, i would suggest to develop a patch (i could submit  
>>> mine), that
>>> could remove "Query 0" lines.

I couldn't find any differences, so I'm afraid I can't help... I'm  
still testing the script, I will let you know if I find again this  
problem.

>>>
>> Could you upload your "Query 0" patch to Bug 2927?
>> http://bugzilla.open-bio.org/show_bug.cgi?id=2927
>>
> Now i'm wuite busy, because i'm working on a different project and  
> i've
> to manage deliveries...
> but i will for sure upload my patch ASAP.
>>
>>> I aplogize if i understanded the problem wrongly and for the fact  
>>> that
>>> i'm entering in the discussion in this moment (maybe when the
>>> discussion is finished)...
>>>
>>
>> Well I don't (yet) understand what the problem is either ;)
>>
>> Peter
>>
> Ciao
> andrea


Best,



-- Miguel







More information about the Biopython mailing list