[BioPython] BLAST Parser query
Peter
biopython at maubp.freeserve.co.uk
Mon Feb 14 16:22:26 EST 2005
kumar s wrote:
> Hi group,
Hello!
> I used standalone BLAST to align my sequences to
> RefSeq database.
>
> I followed the sample code given in the cookbook.
>
> Here is my code:
(code removed)
I would suggest "while True" rather than "while 1" to take advantage
of the definition of these booleans in Python 2.3, but that is a
style point.
You also call b_iterator.next() twice at the start of the program
(once outside the loop, once on the first time through the loop) so
you will waste/ignore the first result.
i.e. You should comment out the line "b_record = b_iterator.next()"
before the while loop.
> I want to print my query along with my
> alignment.title, because I am blasting 1000 sequences
> against RefSeq and I want to know what the query.
I assume you are blasting with an input FASTA file with 1000
sequences it it.
When you say "query" what do you mean? I can think of three answers:
(a) The name of the original query:
b_record.query
(b) The ENTIRE original query (i.e. a complete sequence of
nucleotides or amino acids) is not included in the blast output
(look inside your file 'xxx_blast.out').
You will have to use the query name (i.e. b_record.query) to work
this out, for example by searching your FASTA input file.
(c) The FRAGMENT of the original query (i.e. a sequence of
nucleotides or amino acids) that was matched to the Blast database
is available, as:
hsp.query.replace('-','')
Here hsp.query gives the string including gap characters '-' to show
how this fragment of the query string aligns with the database match
(see also the hsp.match and hsp.sbjct properties).
Doing hsp.query.replace('-','') gives the fragment of the query
string without the gap characters.
e.g.
from Bio.Blast import NCBIStandalone
b_out = open('xxx_blast.out','r')
b_parser = NCBIStandalone.BlastParser()
b_iterator = NCBIStandalone.Iterator(b_out,b_parser)
#Do use the following line, as you end up calling
#next twice and never look at the first result!
#b_record = b_iterator.next()
#You only need to define the threshold once
E_VALUE_THRESH = 0.4
while True:
b_record = b_iterator.next()
if b_record is None:
break
print "The following results are for query " + b_record.query
for alignment in b_record.alignments:
for que in b_record.query:
for hsp in alignment.hsps:
if hsp.expect < E_VALUE_THRESH:
print '****Alignment****'
print 'title:', alignment.title
print 'length:', alignment.length
print 'e value:', hsp.expect
print 'fragment of query matched:'
print hsp.query.replace('-','')
I hope the indentation survives being emailed!
Peter
More information about the BioPython
mailing list