[BioPython] blast parsing errors
Michiel Jan Laurens de Hoon
mdehoon at c2b2.columbia.edu
Mon Mar 5 15:36:43 UTC 2007
Julius Lucks wrote:
> seq = 'ATCG'
> ...
> fasta_rec.sequence = seq
> ...
> result_handle = NCBIWWW.qblast
> ('blastp','nr',fasta_rec,ncbi_gi=1,expect=cutoff,format_type="XML",entre
You have a nucleotide sequence but are running a protein-protein blast
with blastp. If you run this exact search with Blast through a browser,
it will show you an error message. The function
_parse_qblast_ref_page(handle), which is called from NCBIWWW.qblast,
chokes on this error message. If you want to make this more robust, one
solution might be to check for error messages returned by the Blast
server in _parse_qblast_ref_page.
By the way, the code can be simplified as follows:
from Bio.Blast import NCBIWWW
from Bio.Blast import NCBIXML
#BLAST cutoff
cutoff = 1e-4
#Create a fasta record: title and seq are given
seq = 'ATCG'
b_parser = NCBIXML.BlastParser()
result_handle = NCBIWWW.qblast('blastn', 'nr', seq, ncbi_gi=1,
expect=cutoff, format_type="XML", entrez_query="Viruses [ORGN]")
b_records = b_parser.parse(result_handle)
b_record = b_records[0]
for alignment in b_record[0].alignments:
titles = alignment.title.split('>')
print titles
--------------------------------------------
Note: the BlastParser currently in CVS returns a list of Blast records
instead of a single Blast record, hence the b_records[0] above.
Btw, with NCBIXML currently in CVS, you don't need to create b_parser first:
result_handle = NCBIWWW.qblast('blastn', 'nr', seq, ncbi_gi=1,
expect=cutoff, format_type="XML", entrez_query="Viruses [ORGN]")
b_records = NCBIXML.parse(result_handle)
b_record = b_records.next()
-----------------------------------------------
--Michiel.
--
Michiel de Hoon
Center for Computational Biology and Bioinformatics
Columbia University
1130 St Nicholas Avenue
New York, NY 10032
More information about the Biopython
mailing list