[Biopython] parsing Blast results (xml)
Peter Cock
p.j.a.cock at googlemail.com
Thu Feb 2 11:09:54 EST 2012
On Thu, Feb 2, 2012 at 3:45 PM, Sarttu Bourvir <bpkth2012 at gmail.com> wrote:
> Hi,
> I am new to biopython and having problems parsing a blast reulst file (xml
> format).
> I can get out alignments, alignment length, title etc.
> But I would additionally need to print the query title , percent
> similarity, e-value.
Well e-value is easy, and covered in the tutorial - e.g.
for alignment in blast_record.alignments:
for hsp in alignment.hsps:
print '****Alignment****'
print 'sequence:', alignment.title
print 'length:', alignment.length
print 'e value:', hsp.expect
print hsp.query[0:75] + '...'
print hsp.match[0:75] + '...'
print hsp.sbjct[0:75] + '...'
For percentage similarity I think you must use hsp.positives
and the alignment length. Likewise hsp.identities can be used
to get the percentage identity.
> How does one do that? Is there anywhere else than Biopython
> cookbook and help(Bio.Blast.NCBIXML.Record) to look for information.
I assume you also know about dir(...) as well? e.g. try dir(hsp)
after the above example or dir(alignment) to see what attributes
these objects have.
> I feel like I don't really understand the
> Blast.Record and where in there things can be found.
> Is the sequence query title in the header?
Yes, the query details should be captured. Try dir(blast_record)
where blast_record is a Bio.Blast.Record from the parser.
Peter
More information about the Biopython
mailing list