[Biopython] NCBI e-utils parser upgrade

Fri Nov 21 09:11:32 UTC 2014

Thanks Ivan,

I'll keep an eye out for any official NCBI announcement. There are
already some Entrez databases returning DTD-less XML files (for
example dbSNP [1]), so if this is going to be more common then
the Bio.Entrez XML parser will need a rethink.

I don't know enough about XML schemas to say how easily we
could use them in a similar way to the DTD records to automatically
parse the XML?

Another option might be to just recommend one or more of the Python
standard library functions, with a basic helper function in Bio.Entrez
for working with the NCBI history feature (extract WebEnv and
QueryKey)?

Regards,

Peter

[1] We stopped using BugZilla a long time ago, but for anyone with
a large email archive, the old bug report on this was:
http://bugzilla.open-bio.org/show_bug.cgi?id=2771

On Thu, Nov 20, 2014 at 5:42 PM, Ivan Erill <ivan.erill at gmail.com> wrote:
> Hi all,
>
> As part of my work, I need to deal with the new WP protein records at NCBI
> and, specifically, with the information on their coding sequences. This
> information is returned by E-utils through a an integrated protein report
> type of view:
>
> http://eutils.ncbi.nlm.nih.gov/entrez/eutils/efetch.fcgi?db=protein&id=231025&rettype=ipg
>
> which does not use a DTD for the XML, but rather a schema. Although there
> has been no formal announcement, I've been talking to NCBI people and they
> tell me that they will progressively be moving to schemas (which provide
> more fine grained validation specification). Specifically, all new XML
> exports from NCBI will be using schemas. I don't believe that existing DTDs
> are going to be replaced by schemas for now.
>
> My original through was to branch an update for the current XML parser in
> BioPython, but it looks like using schemas would be a major overhaul of the
> existing code-base and it might make more sense to develop a parallel
> parser, so I first wanted to check on what approach you guys would prefer to
> do code-wise.
>
> Regards,
>
> Ivan
>
>
> _______________________________________________
> Biopython mailing list  -  Biopython at mailman.open-bio.org
> http://mailman.open-bio.org/mailman/listinfo/biopython