[Biopython-dev] New: Uniprot XML parser

Andrea Pierleoni andrea at biocomp.unibo.it
Wed Jan 20 16:57:47 UTC 2010


>
> Something I should have mentioned earlier (I forgot this wasn't
> checked in yet) was feature support in the existing "swiss" plain
> text parser - hopefully we can get that working nicely as part of
> this XML work:
>
> http://bugzilla.open-bio.org/show_bug.cgi?id=2235
>
> Peter
>

I know that the plain text swissprot parser can parse features, but
last time I checked these features were not included in SeqRecords
generated by Bio.SeqIO.
If the two parsers have to report similar results, than the 'swiss'
format in Bio.SeqIO must reports features too.
I made a few changes to the original parser to map data as close as
possible to the plain text parser (available on github).

However the big issue are going to be the comment field:
- 1 big string in the plain text parser
- several annotation fields in the XML parser.

I think that obtaining the same results is going to be difficult.
It is hard to map the big string to many annotations (very error prone)
and is also hard to map many annotations to a single string...

Anyhow, unit testing is coming (thanks to Mauro) together with a detailed
comparison between the two parsed seqrecords.

Andrea




More information about the Biopython-dev mailing list