[Bioperl-l] xml sequence download from ncbi

Geer, Lewis (NLM) lewisg@mail.nih.gov
Thu, 24 Aug 2000 10:30:17 -0400


Thanks, Ewan,
The xml you see is pretty stable as it's based on the asn.1 format, which is
mature.  The only changes ought to be bug fixes.

You'd have to ask Jim Ostell about the future of this, as he is the author
of much of it as well as Andriy Klymenko.  Jim has discussed creating
another DTD that decouples the XML and ASN.1 as you propose -- this might
have the added benefit of filtering some of the complexity that is purely
historical in nature.  The blast xml was done this way.

If someone does parse the XML and gives comments, this would help a great
deal.

Lewis

> -----Original Message-----
> From: Ewan Birney [mailto:birney@ebi.ac.uk]
> Sent: Thursday, August 24, 2000 9:21 AM
> To: Geer, Lewis (NLM)
> Cc: Bioperl; bioxml-dev@bioxml.org
> Subject: Re: [Bioperl-l] xml sequence download from ncbi
> 
> 
> On Thu, 24 Aug 2000, Geer, Lewis (NLM) wrote:
> 
> > Hi, 
> > 
> > Sequence download using an xml format derived from our 
> asn.1 standard format
> > is now available from Entrez.  For an example, try
> > 
> http://www.ncbi.nlm.nih.gov/entrez/viewer.cgi?cmd&save=on&view
> =xml&val=18279
> > 15  where val is the sequence gi number.  Note that this 
> xml output is based
> > on our asn.1 records which are both complete and complex -- 
> we may end up
> > making a genbank flatfile-like version, especially since 
> there are small
> > mismatches between the asn.1 and xml languages that make 
> the xml a bit more
> > complex than if xml was our native format.
> 
> Very interesting. Great stuff. I'm going to forward this onto 
> the bioxml
> lists as well.
> 
> The DTD is clearly very asn based (lots of nesting). How 
> stable are parts
> of this XML? What is the "going forward" view of XML from NCBI (is
> there one?) Hmmmm. 
> 
> 
> I would actually suggest that there is not a tight coupling 
> if possible
> between the internal ASN.1 model and the actual XML dumped - 
> I'm trying to
> prevent the foreseeable problems of NCBI wanting to move their ASN.1
> model: however, of course, someone has to write the code for that and
> manage it, and there are arguments to say that this is possibly better
> done in, say, bioperl and NCBI just should make a clear, easy-to-parse
> data dump...
> 
> 
> Any volunteers for writing the bioperl parser for this (?). I suspect
> we wont know really what comments we have on this until someone starts
> bashing out a parser for it.
> 
> 
> > 
> > We'd be interested in seeing comments!
> > 
> 
> Thanks for letting us comment. It is great to see a NIH person on this
> list...
> 
> 
> > Lewis
> > _______________________________________________
> > Bioperl-l mailing list
> > Bioperl-l@bioperl.org
> > http://bioperl.org/mailman/listinfo/bioperl-l
> > 
> 
> -----------------------------------------------------------------
> Ewan Birney. Mobile: +44 (0)7970 151230, Work: +44 1223 494420
> <birney@ebi.ac.uk>. 
> -----------------------------------------------------------------
> 
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l@bioperl.org
> http://bioperl.org/mailman/listinfo/bioperl-l
>