[Biopython] Problems parsing with PSIBlastParser
Andrea
andrea at biodec.com
Wed Oct 14 14:28:17 UTC 2009
Hi to everybody,
I work with blast quite often and i could say i run hundreds of thousand
of blastpgp. The "Query 0" outpt of blastpgp, is quite common for me, and
i wrote a patch to my code, to remove these "nasty" lines, before passing
the output to the parser.
I found these type of lines in at least 1-2% of my runs. And i'm fully sure
that i found them either in the output of blast via shell and in the output
of blast via Biopython.
The problem, according to me, is in the blastpgp algorithm and maybe
could be managed in biopython (as i did in my code), cutting out these
"Query 0" lines, because from the point of view of the alignments,
they don't have any sense. It seems that blastpgp, wants to show
wich is the part of the target sequence align to the query before the
starting
point of the query itself (something like opening a gap, at the
beginning of the query).
And this happens "sometimes", and without any apparent reason.
What i think, is that there aren't any problem with biopython in wrapping
the blastpgp process and maybe, but i'm not sure, the difference in the
output could be related to small differences in the parameter of the process
(or in the environment... or in the .ncbirc file).
I always was able to observe the identity between the blastpgp output
via shell (bash) and the output of the popen wrapper.
Miguel, could you check if really everything is identical? Because i'm
really
surprised of such a strange behaviour....
Despite, according to me there aren't any problem in biopython, and maybe,
Miguel will be able to discover some differences in the way blastpgp is
launched,
i would suggest to develop a patch (i could submit mine), that could remove
"Query 0" lines.
I aplogize if i understanded the problem wrongly and for the fact that
i'm entering
in the discussion in this moment (maybe when the discussion is finished)...
Thanks
Andrea
Miguel Ortiz Lombardia ha scritto:
> Le 14 oct. 09 à 14:37, Peter a écrit :
>
>> On Wed, Oct 14, 2009 at 12:30 PM, Miguel Ortiz Lombardia
>> <miguel.ortiz-lombardia at afmb.univ-mrs.fr> wrote:
>>>
>>> Hi again, Peter.
>>>
>>> Well, it turned out that I don't have such work-around... When I
>>> launched
>>> the script as:
>>>
>>> nohup lpbl.py ... &
>>>
>>> against all my sequences it choked at the first one (quite longer
>>> than the
>>> one I was using as an example) with the very same error.
>>
>> It would take longer as it would wait for BLAST to finish before
>> starting
>> to parse it.
>>
>>> However, this time I have the "temp.txt" file and indeed there lines
>>> such as:
>>>
>>> Query: 0 -
>>>
>>> Sbjct: 445
>>> G 445
>>>
>>> Query: 0
>>>
>>> Sbjct: 445
>>> G 445
>>>
>>> Query: 0 ------
>>>
>>> Sbjct: 1316 ETNAPV
>>> 1321
>>>
>>> present for some alignments and it cannot be parsed by my code.
>>
>> Those do look strange.
>>
>>> When I run blastpgp myself on the command line, same arguments, and
>>> catch
>>> the standard output to a temp2.txt file, the latter file does not
>>> contain
>>> those lines and can be parsed correctly.
>>
>> This is odd, and I am not sure what would cause this.
>>
>>> So, in the end I went back to my code and modified according to your
>>> recommendation of using the commandline applications. The relevant
>>> part of
>>> code now looks like this:
>>> ...
>>> And it works!
>>
>> Great - I'm glad my vague instructions made sense :)
>>
>
> They were quite clear :-) and the pointer to the alignment tutorial
> helped a lot.
>
>>> Thanks again for your help,
>>
>> At least we have solution, even if we didn't get to the bottom of
>> the strange BLAST output. I'll close the bug...
>>
>
> That's fine.
>
> Thanks!
>
>
>
> -- Miguel
>
>
>
>
> _______________________________________________
> Biopython mailing list - Biopython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython
More information about the Biopython
mailing list