[BioPython] target sequence length in blast parsing

Sun Jan 7 17:06:38 UTC 2007

There are two things you can do:

First, try dir(record) on the Blast record to see if the information you 
are looking for is hiding in one of those variables.

If you can't find it there, the following should work, assuming that you 
parse Blast XML output instead of Blast plain-text output (the latter 
may or may not work):

 >>> from Bio.Blast import NCBIXML
 >>> inputfile = open("myblastoutput.xml")
 >>> records = NCBIXML.parse(inputfile)
 >>> for record in records:
...      print record.query_letters
 >>> inputfile.close()

Two caveats:
1) This uses the latest Blast parsing code in CVS; it is not in 
Biopython release 1.42. You can download the new files in Bio/Blast/*.py 
from CVS and just copy them over the corresponding files of release 1.42 
to make this work.
2) Jacob Joseph makes the (I believe correct) argument that there are 
some inconsistencies between variable names in the Biopython blast 
parsers. So record.query_letters may be called differently in a future 
Biopython release. See Bug #2176 on Bugzilla for more information.

--Michiel.

Ann Loraine wrote:
> Dear all,
> 
> I have a question about blast parsing in biopython - any tips would be
> much appreciated.
> 
> How can I access the length of the target sequence (e.g., 669 in the
> following text) from alignment (or other?) objects retrieved from a
> blast report parse?
> 
> ** example **
> 
>> gi|34908012|ref|NP_915353.1| putative carboxypeptidase D [Oryza sativa (japo
> nica
>            cultivar-group)]
>           Length = 669
> 
>  Score =  247 bits (625), Expect(2) = 3e-71
>  Identities = 108/167 (64%), Positives = 132/167 (78%)
>  Frame = +2
> 
> Yours,
> 
> Ann
>