[BioPython] blast parsing errors

Mon Mar 5 15:36:43 UTC 2007

Julius Lucks wrote:
> seq = 'ATCG'
 > ...
> fasta_rec.sequence = seq
> ...
> result_handle = NCBIWWW.qblast 
> ('blastp','nr',fasta_rec,ncbi_gi=1,expect=cutoff,format_type="XML",entre 
You have a nucleotide sequence but are running a protein-protein blast 
with blastp. If you run this exact search with Blast through a browser, 
it will show you an error message. The function 
_parse_qblast_ref_page(handle), which is called from NCBIWWW.qblast, 
chokes on this error message. If you want to make this more robust, one 
solution might be to check for error messages returned by the Blast 
server in _parse_qblast_ref_page.

By the way, the code can be simplified as follows:
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML

#BLAST cutoff
cutoff = 1e-4

#Create a fasta record: title and seq are given
seq = 'ATCG'

b_parser = NCBIXML.BlastParser()

result_handle = NCBIWWW.qblast('blastn', 'nr', seq, ncbi_gi=1, 
expect=cutoff, format_type="XML", entrez_query="Viruses [ORGN]")

b_records = b_parser.parse(result_handle)
b_record = b_records[0]

for alignment in b_record[0].alignments:
      titles = alignment.title.split('>')
      print titles

--------------------------------------------

Note: the BlastParser currently in CVS returns a list of Blast records 
instead of a single Blast record, hence the b_records[0] above.

Btw, with NCBIXML currently in CVS, you don't need to create b_parser first:

result_handle = NCBIWWW.qblast('blastn', 'nr', seq, ncbi_gi=1, 
expect=cutoff, format_type="XML", entrez_query="Viruses [ORGN]")

b_records = NCBIXML.parse(result_handle)
b_record = b_records.next()

-----------------------------------------------

--Michiel.

-- 
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1130 St Nicholas Avenue
New York, NY 10032