[Biopython] Problems parsing with PSIBlastParser

Andrea andrea at biodec.com
Wed Oct 14 15:02:40 UTC 2009


Peter ha scritto:
> On Wed, Oct 14, 2009 at 3:28 PM, Andrea <andrea at biodec.com> wrote:
>   
>> Hi to everybody,
>> I work with blast quite often and i could say i run hundreds of thousand
>> of blastpgp. The "Query 0" outpt of blastpgp, is quite common for me, and
>> i wrote a patch to my code, to remove these "nasty" lines, before passing
>> the output to the parser.
>>
>> I found these type of lines in at least 1-2% of my runs. And i'm fully sure
>> that i found them either in the output of blast via shell and in the output
>> of blast via Biopython.
>>
>> The problem, according to me, is in the blastpgp algorithm and maybe
>> could be managed in biopython (as i did in my code), cutting out these
>> "Query 0" lines, because from the point of view of the alignments,
>> they don't have any sense. It seems that blastpgp, wants to show
>> which is the part of the target sequence align to the query before the
>> starting point of the query itself (something like opening a gap, at the
>> beginning of the query).
>> And this happens "sometimes", and without any apparent reason.
>>     
>
> Andrea - do you have any small example output files with this
> problem? If it does occur fairly often (1 to 2% of the time), then
> we should try and update the parser to cope. Miguel's example
> is useful for testing while working on a bug fix, but too big to
> include as part the unit tests.
>
>   
mmm... i've to search. I've some "cache" of gzipped blastpgp outputs.
But I'm not
sure i've the original (maybe already patched).... waht I'm sure, is
that in the
next month I'm going to run almost 100.000 blasptpg so I'll for sure find
something small. ;-)
>> What i think, is that there aren't any problem with biopython in wrapping
>> the blastpgp process and maybe, but i'm not sure, the difference in the
>> output could be related to small differences in the parameter of the process
>> (or in the environment... or in the .ncbirc file).
>>
>> I always was able to  observe  the identity  between the blastpgp output
>> via shell (bash) and the output of the popen wrapper.
>>     
>
> If you saw "Query 0" output at the command line (shell), then that is
> worth knowing.
>
>   
i think so.
>> Miguel, could you check if really everything is identical? Because i'm
>> really surprised of such a strange behaviour....
>>     
>
> Maybe the environment variables are different or something?
>
>   
>> Despite, according to me there aren't any problem in biopython, and maybe,
>> Miguel will be able to discover some differences in the way blastpgp is
>> launched, i would suggest to develop a patch (i could submit mine), that
>> could remove "Query 0" lines.
>>     
>
> Could you upload your "Query 0" patch to Bug 2927?
> http://bugzilla.open-bio.org/show_bug.cgi?id=2927
>   
Now i'm wuite busy, because i'm working on a different project and i've
to manage deliveries...
but i will for sure upload my patch ASAP.
>   
>> I aplogize if i understanded the problem wrongly and for the fact that
>> i'm entering in the discussion in this moment (maybe when the
>> discussion is finished)...
>>     
>
> Well I don't (yet) understand what the problem is either ;)
>
> Peter
>   
Ciao
andrea



More information about the Biopython mailing list