[BioPython] HSPs in Blast parser
    Brad Chapman 
    chapmanb at uga.edu
       
    Wed May  5 14:56:19 EDT 2004
    
    
  
Hi Jawad;
Sorry for the delay in getting back with you.
> I am stuck on parsing a BlastN output and would appreciate some help. 
> I am working on multiple HSPs for a single hit . For example if there 
> are two hsps found for one hit, I need to find where query and subject 
> ends for one hsp and then compare it with the query and subject start 
> for the next hsp
> 
> I have noticed that in the blast parser one can iterate through each 
> hsp for every single hit, but am not too sure how to treat two hsps of 
> a single hit as related and iterate through the two hsps of a single hit 
> in order to find the query (and subject) end of one and query (and subject) 
> start of the other.
You just need to iterate through each alignment and then through the
HSPs in each alignment. Then you can collect up the start and end
coordinates of each HSP and associate them with the overall
alignment. So, step by step:
1. Iterate through each alignment.
2. Create lists of HSP information for queries and subjects
  2a. Iterate through each HSP
  2b. Get the start and end coordinates from the hsp.query_start and
      hsp.sbjct_start attributes of the hsps
  2c. Calculate the HSP ends using the query and subject
      information, removing gaps added by BLAST
  2d. Add each start, end to the lists
3. You have the HSP lists, associated with the alignment title.
Here is some code which demonstrates this:
from Bio.Blast import NCBIStandalone
parser = NCBIStandalone.BlastParser()
rec = parser.parse(open("blast.out"))
for align in rec.alignments:
    query_hsps = []
    sbjct_hsps = []
    for hsp in align.hsps:
        # adjust query starts to python (0-based) coordinates
        query_start = hsp.query_start - 1
        # get the end from the length minus gaps
        query_end = query_start + len(hsp.query.replace("-", ""))
        sbjct_start = hsp.sbjct_start - 1
        sbjct_end = sbjct_start + len(hsp.sbjct.replace("-", ""))
        query_hsps.append((query_start, query_end))
        sbjct_hsps.append((sbjct_start, sbjct_end))
    print "Hit", align.title
    print "Query HSPs", query_hsps
    print "Subject HSPs", sbjct_hsps
Hope this helps!
Brad
    
    
More information about the BioPython
mailing list