[Biopython] traditional NCBI blast vs. blast+

Peter Cock p.j.a.cock at googlemail.com
Thu Mar 3 10:27:54 UTC 2011


On Wed, Mar 2, 2011 at 11:00 PM, Martin Mokrejs
<mmokrejs at fold.natur.cuni.cz> wrote:
> Hi,
>  I needed to run and parse some blastn analysis. I had a look into the Tutorial
> and followed the currently recommended blast+ approach. Somewhat I was not
> getting any results. It seems to me a formatdb-formatted database is not readable
> by the blast+ tools.

I think it is possible to get databases which will work with both legacy BLAST
and BLAST+ (since the NCBI only offer one set for NR etc) but I have not tried
to mix the two. As pointed out by Paulo, the successor to formatdb in BLAST+
is makeblastdb, so just use that instead.

> I had a look what tools are installed on my Gentoo Linux
> along with blastn, blastx and the other tools coming from blast+ bundle and from
> filenames I just could not guess what am I supposed to run over my fasta
> target database to make it searchable by blastn.

This is very clear in the BLAST+ documentation from the NCBI website
(link given below), and is arguably a Gentoo packaging issue.

> I would prefer if biopython
> would throw out some error if there are no appropriate files (which names could
> be guessed depending on the (t)blastn/x/p, etc.).

BLAST+ itself generally gives useful errors.

>  The tutorial mentions that I should lookup an older version of the Tutorial
> for examples on the old, NCBI blast usage via biopython. It took me a while but
> I found through Google some docs like that. ;-)

You could have just downloaded one of the old Biopython releases (the zip
or tar balls) and looked in the Doc subdirectory. I'll clarify the current text
in the tutorial to point people there.

> On Gentoo the ncbi-tools++ (aka blast+) package installs no documentation,
> not a single README, HOWTO, Changes, just the binaries and libs.

File a bug with Gentoo?

> What is installed
> on other Linux platform, would you mind sharing this with me? I just failed
> to find by Google what tools should I use instead of the formatdb. I found
> some FAQ on the NCBI tools++ site but that talked just about C++ API etc.,
> nothing from the user perspective.

You are probably looking for this, linked to from the BLAST+ download page:
ftp://ftp.ncbi.nlm.nih.gov/blast/executables/blast+/LATEST/user_manual.pdf

> On Gentoo, the {asn2asn,rpsblast,test_regexp} from ncbi-tools++ is not being
> installed because they have same name as the same utility from "old" ncbi-tools
> (hence overwting their files). The ncbi-tools++ package is not allowed to be
> installed on stable "systems" (lack of testing or open bug reports) so most people
> using Gentoo do NOT have ncbi-tools++ and probably won't for a while.

I was aware of the name clash for rpsblast, and yes, this is a problem the
NCBI could have avoided.

You could just ignore the Gentoo package and get BLAST+ directly from
the NCBI.

> I propose to keep support for the "old" blast for a long while.

We've already delayed deprecating the ``legacy'' BLAST wrappers,
but probably we should do that after releasing Biopython 1.57.

> Luckily, the
> blastall -m 7 xml output seems to be parseable with Bio.Blast.NCBIXML.

The NCBI kept the same XML output format, and in fact the plain text
output is close enough that our old text parser could be updated to cope.

> What do you think? Is the blast+ approach faster, more stable, or just newer
> so we all like to "upgrade"?

I like BLAST+ for some new functionality (FASTA vs FASTA for example),
but since the NCBI is dropping the ``legacy'' BLAST you will have to
upgrade at some point

> Where are some docs and what is the formatdb-like tool in blast+. ;)

I've given links to the docs above, they're linked to on the NCBI website.

Regards,

Peter




More information about the Biopython mailing list