[Biopython-dev] New: Uniprot XML parser

Andrea Pierleoni andrea at biocomp.unibo.it
Thu Jan 14 18:57:58 UTC 2010


Hi Everyone,
I've been using a lot biopython in the last couple of years, it is very
useful to me. So now it's my turn to contribute and be helpful to someone
else.
I wrote a parser for the Uniprot XML format, that is reasonably fast (8000
entries/min on a core2duo mainstream PC). The main improvements with the
actual SwissProt flat file parser are a deeper parsing of comment fields,
and a Seqrecord containing features.

The parser is based on the ElementTree library and was successfully tested
on the complete SwissProt database (v57.12). Thus I think it is ready to
be released.

I followed the rules to develop a new parser for SeqIO, filed an
enhancement bug to bugzilla (bug 2992), and included the parser in a
public biopython fork on github available at:

http://github.com/apierleoni/biopython/tree/uniprotxml-branch

the new parser is in the "uniprotxml-branch" branch, and the parser code
is in Bio/SeqIO/UniprotIO.py

The parser can be used from SeqIO using:

iterator=SeqIO.parse(handle,'uniprot')


I think this could be easily integrated in Biopython,  unit test is still
missing, but should be very easy to do.
Anyhow any code review or suggestions are welcome.

Andrea




More information about the Biopython-dev mailing list