[Biopython] Parsing large blast files

Peter Cock p.j.a.cock at googlemail.com
Wed Apr 29 04:33:03 EDT 2009


On Wed, Apr 29, 2009 at 2:28 AM, Michiel de Hoon <mjldehoon at yahoo.com> wrote:
>
> How would users typically use Bio.Blast.Applications?
>

In the next release, I would aim to have Bio.Blast.Applications
updated to cover blastall (fully), plus blastpgp and rpsblast
(currently not covered) and for the three helper functions
Bio.Blast.NCBIStandalone.blastall, blastpgp and rpsblast to all use
Bio.Blast.Applications internally.  I would suggest at some point
(perhaps a release later) calling the three helper functions obsolete,
and eventually deprecating them, but I appreciate these are well
documented and well used, so this should be a gradual transistion.

In the future I would see people contructing their application command
line object and then using it to spawn the task as needed.  The
Bio.Applicaition.generic_run might suffice for low output tools,
ranging up to using the builtin subprocess module for full control.
The command line string can also be used in other ways, e.g. for
submission to a computing cluster using qsub, or writing to a shell
script etc.

The point about this is decoupling constuction of the command line
string, and actually executing it.  Right now the
Bio.Blast.NCBIStandalone.blastall, blastpgp and rpsblast functions do
both, and there is no way to (a) see what the command line used was,
which makes debugging difficult, and (b) no way to control how it is
invoked (e.g. recent Windows GUI questions).

Another immediate benefit is an example usage that I do quite often:
Running BLAST and saving the output to a file.  The cleanest way to do
this is to use the -o option to get BLAST itself to write to a file.
If you do this, then there is no useful output written to the handles
- but the Bio.Blast.NCBIStandalone make this fiddly (see Bug 2654).
Right now the tutorial does something equally indirect - in python
read BLAST output from stdout and save it to a file (and probably not
in a memory efficient way either!).

See also this thread on where to put new command line wrappers:
http://lists.open-bio.org/pipermail/biopython-dev/2009-April/005766.html

If you where asking about the actual code for how to build the command
line object, well I have some thoughts on making the current
Bio.Application base class easier to use (properties and keyword
arguments at init) which I have started to discuss on the dev list.

Peter


More information about the Biopython mailing list