[Biopython-dev] New: Uniprot XML parser
Andrea Pierleoni
andrea at biocomp.unibo.it
Tue Sep 14 16:22:06 UTC 2010
Hi Peter,
I've commented your commits directly on github, basically agreeing with
them.
Parsing PDB structures as positional features was done to capture all the
information in the uniprot file. I do not see any better place than a
SeqFeature for a positional information, the only option here is to skip it.
I saw in your repository you are using the string "uniprot-xml" to call
the parser, however the format name at the EBI REST and SOAP services
is simply "uniprotxml". take a look at:
http://www.ebi.ac.uk/Tools/webservices/services/dbfetch_rest
I think it is better to be conservative in this.
I'm still working on the SeqIO.index to make a faster implementation. RE
are really slow, and ElementTree should cope well with this task.
Anyhow it works with the current implementation, so it's not a big deal.
Andrea
> Hi Andrea,
>
> I've done some work on the plain text swiss parser to handle features,
> and some basic testing to make sure it agrees with the uniprot-xml
> parser. This showed some problems with end locations out by one
> in the XML parser which I believe I was able to resolve. I have also
> commented out the use of the skip_parsing_errors option - it doesn't
> seem to be needed and silent errors are bad.
>
> I have (for the moment) introduced a couple of new position classes
> in Bio.SeqFeature for "?123" where we have a position but it is
> uncertain, and "?" where we don't have a position at all. The later
> might be handled more elegantly by inferring a Before/AfterPosition
> instead...
>
> Note that for testing purposes, I have disabled your code where
> it builds a SeqFeature for a dbReference - I'm not sure what the
> best plan here is yet.
>
> Could you have a look at my branch please?
>
> http://github.com/peterjc/biopython/commits/uniprot
>
> Thanks,
>
> Peter
>
More information about the Biopython-dev
mailing list