[BioPython] target sequence length in blast parsing
Michiel de Hoon
mdehoon at c2b2.columbia.edu
Sun Jan 7 17:06:38 UTC 2007
There are two things you can do:
First, try dir(record) on the Blast record to see if the information you
are looking for is hiding in one of those variables.
If you can't find it there, the following should work, assuming that you
parse Blast XML output instead of Blast plain-text output (the latter
may or may not work):
>>> from Bio.Blast import NCBIXML
>>> inputfile = open("myblastoutput.xml")
>>> records = NCBIXML.parse(inputfile)
>>> for record in records:
... print record.query_letters
>>> inputfile.close()
Two caveats:
1) This uses the latest Blast parsing code in CVS; it is not in
Biopython release 1.42. You can download the new files in Bio/Blast/*.py
from CVS and just copy them over the corresponding files of release 1.42
to make this work.
2) Jacob Joseph makes the (I believe correct) argument that there are
some inconsistencies between variable names in the Biopython blast
parsers. So record.query_letters may be called differently in a future
Biopython release. See Bug #2176 on Bugzilla for more information.
--Michiel.
Ann Loraine wrote:
> Dear all,
>
> I have a question about blast parsing in biopython - any tips would be
> much appreciated.
>
> How can I access the length of the target sequence (e.g., 669 in the
> following text) from alignment (or other?) objects retrieved from a
> blast report parse?
>
> ** example **
>
>> gi|34908012|ref|NP_915353.1| putative carboxypeptidase D [Oryza sativa (japo
> nica
> cultivar-group)]
> Length = 669
>
> Score = 247 bits (625), Expect(2) = 3e-71
> Identities = 108/167 (64%), Positives = 132/167 (78%)
> Frame = +2
>
> Yours,
>
> Ann
>
More information about the Biopython
mailing list