[Biopython] parsing Blast results (xml)

Peter Cock p.j.a.cock at googlemail.com
Thu Feb 2 11:09:54 EST 2012


On Thu, Feb 2, 2012 at 3:45 PM, Sarttu Bourvir <bpkth2012 at gmail.com> wrote:
> Hi,
> I am new to biopython and having problems parsing a blast reulst file (xml
> format).
> I can get out alignments, alignment length, title etc.
> But I would additionally need to print the query title , percent
> similarity, e-value.

Well e-value is easy, and covered in the tutorial - e.g.

for alignment in blast_record.alignments:
    for hsp in alignment.hsps:
            print '****Alignment****'
            print 'sequence:', alignment.title
            print 'length:', alignment.length
            print 'e value:', hsp.expect
            print hsp.query[0:75] + '...'
            print hsp.match[0:75] + '...'
            print hsp.sbjct[0:75] + '...'

For percentage similarity I think you must use hsp.positives
and the alignment length. Likewise hsp.identities can be used
to get the percentage identity.

> How does one do that?  Is there anywhere else than Biopython
> cookbook and help(Bio.Blast.NCBIXML.Record) to look for information.

I assume you also know about dir(...) as well? e.g. try dir(hsp)
after the above example or dir(alignment) to see what attributes
these objects have.

> I feel like I don't really understand the
> Blast.Record and where in there things can be found.
> Is the sequence query title in the header?

Yes, the query details should be captured. Try dir(blast_record)
where blast_record is a Bio.Blast.Record from the parser.

Peter



More information about the Biopython mailing list