[BioPython] Better blasting with XML
Peter Maxwell
maxwell@biolateral.com.au
Wed, 21 Aug 2002 16:35:29 +1000
Hi all,
Biopython's blast parser keeps breaking. This is normal behaviour for a blast
parser. To quote the blast2 release notes:
"The BLAST report is not intended to be a
parseable document. It is subject to change
with little or no notice. "
Blast can produce parser friendly XML or tabular output so there is no need to
battle with the traditional blast report format. I attempted to use
Bio.Blast.NCBIWWW.blast(..., format_type='html'), to fetch XML formated
output from NCBI but that doesn't work. I think the function makes
assumptions about what will appear on the returned web pages that aren't true
when format_type isn't 'html'.
There is another function in there, blasturl(), but the 'stable URL' it uses
is based on the old email blast interface and so predates format_type and
other recent blast features.
So I wrote something more or less equivalent to NCBIWWW.py using the 'new
stable URL' http://www.ncbi.nlm.nih.gov/blast/Blast.cgi, documentation for
which lives at http://www.ncbi.nlm.nih.gov/BLAST/Doc/urlapi.html. This is
the same interface (also known as QBlast) that Bio.Blast.NCBIWWW.blast()
uses, but since my version doesn't try to parse web forms it is is a bit more
flexible and reliable.
I also wrote the XML blast output parser I needed. It doesn't make an object
with the same interface as the current biopython blast parser because that
turned out to be too hard, the interface being very much influenced by the
details of the traditional blast report. The XML schema is simpler, it is
directly based on the ASN.1 schema which in turn is very close to the C data
structures in the blast code itself.
Available at:
http://www.biolateral.com.au/download/NCBI.py
http://www.biolateral.com.au/download/NCBIXML.py
The code is GPL'ed for general distribution but I (and BioLateral) would be
happy to see any of it find its way into biopython so it is also available to
the biopython project for integration into biopython under biopython's
licence.
Cheers,
-- Peter Maxwell