[Biopython] NCBI e-utils parser upgrade
Ivan Erill
ivan.erill at gmail.com
Mon Nov 24 16:16:51 UTC 2014
Michiel, Peter,
Thanks, for the feedback. Updating startNamespaceDeclHandler seems to be
the logical way to go. I don't have much experience with XML schemas, but I
will give it a try and make a pull request if I get something decent
working.
Ivan
On Thu, Nov 20, 2014 at 8:20 PM, Michiel de Hoon <mjldehoon at yahoo.com>
wrote:
> Hi Ivan,
>
> I am the original author of Bio.Entrez.
> The parser in Bio.Entrez consists of two parts: The XML parser and the DTD
> parser.
> The DTD parser is used to determine how the elements in the XML file
> should be represented in Python.
> To allow schemas, all that is needed is to write a parser for the schema;
> the XML parser is unchanged.
> In Bio/Entrez/Parser.py, you will find the method
> startNamespaceDeclHandler;
> currently it just raises a NotImplementedError.
> If you try the Bio.Entrez parser on your XML file, you will see that this
> error gets raised.
> So all you would have to do is to implement startNamespaceDeclHandler;
> it should parallel externalEntityRefHandler, which parses DTD files,
> though the bulk of the work is done in elementDecl.
> Please let me know if you run into any problems.
>
> Best,
> -Michiel.
>
>
>
>
> --------------------------------------------
> On Fri, 11/21/14, Ivan Erill <ivan.erill at gmail.com> wrote:
>
> Subject: [Biopython] NCBI e-utils parser upgrade
> To: biopython at mailman.open-bio.org
> Date: Friday, November 21, 2014, 2:42 AM
>
> Hi all,
> As part of my
> work, I need to deal with the new WP protein records at NCBI
> and, specifically, with the information on their coding
> sequences. This information is returned by E-utils through a
> an integrated protein report type of view:
>
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=231025&rettype=ipg
>
> which does not use
> a DTD for the XML, but rather a schema. Although there has
> been no formal announcement, I've been talking to NCBI
> people and they tell me that they will progressively be
> moving to schemas (which provide more fine grained
> validation specification). Specifically, all new XML exports
> from NCBI will be using schemas. I don't believe that
> existing DTDs are going to be replaced by schemas for
> now.
> My original
> through was to branch an update for the current XML parser
> in BioPython, but it looks like using schemas would be a
> major overhaul of the existing code-base and it might make
> more sense to develop a parallel parser, so I first wanted
> to check on what approach you guys would prefer to do
> code-wise.
> Regards,
> Ivan
>
> -----Inline Attachment Follows-----
>
> _______________________________________________
> Biopython mailing list - Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://mailman.open-bio.org/pipermail/biopython/attachments/20141124/4954570c/attachment.html>
More information about the Biopython
mailing list