[Biopython-dev] New: Uniprot XML parser

Peter Cock p.j.a.cock at googlemail.com
Thu Jan 14 19:16:49 UTC 2010


On Thursday, January 14, 2010, Andrea Pierleoni <andrea at biocomp.unibo.it> wrote:
> Hi Everyone,
> I've been using a lot biopython in the last couple of years, it is very
> useful to me. So now it's my turn to contribute and be helpful to someone
> else.
> I wrote a parser for the Uniprot XML format, that is reasonably fast (8000
> entries/min on a core2duo mainstream PC). The main improvements with the
> actual SwissProt flat file parser are a deeper parsing of comment fields,
> and a Seqrecord containing features.
>
> The parser is based on the ElementTree library and was successfully tested
> on the complete SwissProt database (v57.12). Thus I think it is ready to
> be released.
>
> I followed the rules to develop a new parser for SeqIO, filed an
> enhancement bug to bugzilla (bug 2992), and included the parser in a
> public biopython fork on github available at:
>
> http://github.com/apierleoni/biopython/tree/uniprotxml-branch
>
> the new parser is in the "uniprotxml-branch" branch, and the parser code
> is in Bio/SeqIO/UniprotIO.py
>
> The parser can be used from SeqIO using:
>
> iterator=SeqIO.parse(handle,'uniprot')
>
>
> I think this could be easily integrated in Biopython,  unit test is still
> missing, but should be very easy to do.
> Anyhow any code review or suggestions are welcome.
>
> Andrea
>
> _______________________________________________
> Biopython-dev mailing list
> Biopython-dev at lists.open-bio.org



Hi

I'd spotted your branch on github - this looks like an excellent
addition to Biopython :)

What I would like to see is a few unit tests, specifically one using
the same record in both XML (with the new parser) and the equivalent
plain text SwissProt file (with the old parser) and check they agree.

Also, I think you should check the start coordinates of the features
are using python counting.

Regards

Peter




More information about the Biopython-dev mailing list