[Biopython-dev] Merging Uniprot XML parser?

Andrea Pierleoni andrea at biocomp.unibo.it
Fri Nov 5 16:43:16 UTC 2010


> On Tue, Oct 19, 2010 at 4:54 PM, Peter <biopython at maubp.freeserve.co.uk>
> wrote:
> I've now merged this into the trunk (with a git rebase first so the
> history
> is linear - no branch+merge), and Andrea has agreed to retest it.
> Other testing and comments are most welcome.
>
> Peter
>


I've done a couple of testing, from the master biopython branch.
The uniprot-xml parser successfully parsed the 2010_11 release of uniprot
containing
522,019 entries.

The plain text 'swiss' parser took 6 mins to parse the complete flatfile
uniprot db on my system (python 2.6 on a macbook pro, core2duo).
the uniprot-xml parser took 12 minutes to do the same task when using
cElementTree and
looks pretty good to me (compare this to the 8 minutes I needed to
download the gzipped db).
However it took more than 80 mins to do the same task using ElementTree.
So be aware
that the parser can turn very slow without the C library.

I'm currently retesting also on TrEMBL, but I don't think there is going
to be any problem.
I have no idea of the performances with jython, and similar derivations of
python, nor if it works.

Andrea







More information about the Biopython-dev mailing list