[BioPython] blastcl3

Peter biopython at maubp.freeserve.co.uk
Tue Dec 30 17:42:14 UTC 2008


On Tue, Dec 30, 2008 at 12:39 AM, Srinivas Iyyer
<srini_iyyer_bio at yahoo.com> wrote:
> Dear Group,
> I am using netblast blastcl3 to blast my small fasta sequences to human genome.
>
> blastcl3 -p blastn -i test.fa -d gpipe/9606/all_contig  -o test.out
>
> Above is my command. I want to be able to parse the output which is a text based format.

I would urge you to tell blast to produce XML output as already
described by Brad.

Just to clarify:
Bio.Blast.NCBIXML includes our XML blast parser (recommended)
Bio.Blast.NCBIStandalone includes our plain text parser (discouraged)
Bio.Blast.NCBIWWW includes our deprecated HTML blast parser

The module naming reflects the historical introduction of the
different BLAST tools, and is unfortunately a little misleading
nowadays since both the standalone command line tool and the website
can produce XML, plain text or HTML output.

> I used this:
> from Bio.Blast import NCBIWWW
> import Bio.Blast.Record
> blast_out = open('test.out','r')
> parser = NCBIWWW.BlastParser()
> blastRecord = parser.parse(blast_out)

The above code will try and parse HTML (web page) format BLAST output
- but you said test.out should be in plain text format, so this won't
work.  If you really want to use the plain text format, try the parser
in Bio.Blast.NCBIStandalone - but it doesn't work 100% on the output
from the latest version of the BLAST standalone tools.

> Instad I did the following:
>
> from Bio.Blast import NCBIWWW
> from Bio.Blast import NCBIXML
>
> fasta_string = open("test.fa").read()
> result_handle = NCBIWWW.qblast("blastn", "gpipe/9606/all_contig", fasta_string)

This function runs BLAST over the internet, and it should default to
XML format.  You can override using the format_type  argument as
described in the docstring or the tutorial.  You should be able to
parse it using Bio.Blast.NCBIXML as you tried...

However, I would assume that "gpipe/9606/all_contig" is a local
database on your machine, so there is no way the NCBI's servers can
use it.  If you examine the results by hand it will probably be an
error message, try this:

print result_handle.read()

Peter



More information about the Biopython mailing list