[BioPython] More objects on BLAST parser

Tue Jun 8 00:20:25 EDT 2004

On Jun 7, 2004, at 11:54 PM, Sebastian Bassi wrote:

> Hello,
>
> I am working with this code:
>
> from Bio.Blast import NCBIStandalone
> b_parser = NCBIStandalone.BlastParser()
> bl=open("C:\\bioinfo-adv\\primervsclones\\MS1248-For","r")
> b_record= b_parser.parse(bl)
> for alignment in b_record.alignments:
>     for hsp in alignment.hsps:
>         print alignment.title
>         print alignment.length
>         print hsp.expect
> bl.close()
>
> Works great. Now I need to know what other objects could I retrieve 
> with this parser. For example I'd like to retrieve:
> The lenght of the input/query sequence.

b_record.query_letters

> The lenght of the hit sequence.

This one is hard because it does not show up in the BLAST record.  It 
prints out only the piece of the subject that has high sequence 
identity with the query sequence.  For BLAST2, you may be able to 
calculate the length of an alignment based on the subject sequence 
(sbjct) or residues (sbjct_start) of the HSPs.  Are you sure you really 
need the length of this sequence?

> The identities (like 20/21).

hsp.identities

> The name of the query sequence.

b_record.query

> Another question: Is there a way (looking at the source code) to know 
> all the available objects?

One way without looking at the source code is to look at the docstring 
for the objects, e.g. help(b_record), help(alignment).  In the source, 
these classes are in Bio/Blast/Record.py.  In the file, each of the 
member variables are documented in the docstring.  I believe that for 
the standalone version of blast, we're parsing out every bit of 
information.  So if it's in the output, it's in the object somewhere!  
;)

Jeff