[Biopython] Legacy blastn XML outfile parsing is slow. What XML parser is actually used?

Peter Cock p.j.a.cock at googlemail.com
Tue Sep 25 12:00:45 EDT 2012


On Tue, Sep 25, 2012 at 3:39 PM, Tanya Golubchik
<golubchi at stats.ox.ac.uk> wrote:
> Hello,
>
> Apologies for not having followed the entire discussion, but just wanted
> to say that we're also using NCBIXML here and are likely to be
> incorporating it in a new piece of software soon, so it would be really
> unfortunate if some tags disappeared, were renamed or (even worse)
> changed meaning in future releases.
>
> I'm a bit late coming in here so maybe this has been answered, but is
> there a better parser that should be used at the moment? I was under the
> impression that NCBIXML is the only one.
>
> Thanks,
> Tanya

Hi Tanya,

I hope I can reassure you there is nothing to worry about :)

Right now there is only the NCBIXML parser, and we're not going
to change it (except possibly to make it a little faster if people
want to work on that).

We're planning to a add new module based on Bow's GSoC
code, under the working name SearchIO, which would cover
BLAST, BLAT, HMMER, etc. This would have a different API
and in the long term would probably replace all of Bio.Blast.
http://biopython.org/wiki/SearchIO

The discussion about possible changes has been (I think)
only about this new code (and would have been better off
on the development mailing list but this thread went off on
a slight tangent).

Once 'SearchIO' is released, we'd want to encourage
people to use that instead of NCBIXML, with a view to
deprecating and eventually removing NCBIXML. See:
http://biopython.org/wiki/Deprecation_policy

Regards,

Peter


More information about the Biopython mailing list