[Biopython] Problems parsing with PSIBlastParser

Peter biopython at maubp.freeserve.co.uk
Wed Oct 14 10:46:48 EDT 2009


On Wed, Oct 14, 2009 at 3:28 PM, Andrea <andrea at biodec.com> wrote:
>
> Hi to everybody,
> I work with blast quite often and i could say i run hundreds of thousand
> of blastpgp. The "Query 0" outpt of blastpgp, is quite common for me, and
> i wrote a patch to my code, to remove these "nasty" lines, before passing
> the output to the parser.
>
> I found these type of lines in at least 1-2% of my runs. And i'm fully sure
> that i found them either in the output of blast via shell and in the output
> of blast via Biopython.
>
> The problem, according to me, is in the blastpgp algorithm and maybe
> could be managed in biopython (as i did in my code), cutting out these
> "Query 0" lines, because from the point of view of the alignments,
> they don't have any sense. It seems that blastpgp, wants to show
> which is the part of the target sequence align to the query before the
> starting point of the query itself (something like opening a gap, at the
> beginning of the query).
> And this happens "sometimes", and without any apparent reason.

Andrea - do you have any small example output files with this
problem? If it does occur fairly often (1 to 2% of the time), then
we should try and update the parser to cope. Miguel's example
is useful for testing while working on a bug fix, but too big to
include as part the unit tests.

> What i think, is that there aren't any problem with biopython in wrapping
> the blastpgp process and maybe, but i'm not sure, the difference in the
> output could be related to small differences in the parameter of the process
> (or in the environment... or in the .ncbirc file).
>
> I always was able to  observe  the identity  between the blastpgp output
> via shell (bash) and the output of the popen wrapper.

If you saw "Query 0" output at the command line (shell), then that is
worth knowing.

> Miguel, could you check if really everything is identical? Because i'm
> really surprised of such a strange behaviour....

Maybe the environment variables are different or something?

> Despite, according to me there aren't any problem in biopython, and maybe,
> Miguel will be able to discover some differences in the way blastpgp is
> launched, i would suggest to develop a patch (i could submit mine), that
> could remove "Query 0" lines.

Could you upload your "Query 0" patch to Bug 2927?
http://bugzilla.open-bio.org/show_bug.cgi?id=2927

> I aplogize if i understanded the problem wrongly and for the fact that
> i'm entering in the discussion in this moment (maybe when the
> discussion is finished)...

Well I don't (yet) understand what the problem is either ;)

Peter



More information about the Biopython mailing list