[Bioperl-l] Eutils supported

Jason Stajich jason@cgt.mc.duke.edu
Thu, 13 Jun 2002 09:25:54 -0400 (EDT)


With 20 minutes work of effort I have migrated to the NCBI e-utils.  As
the current Entrez CGIs will expire in Dec 2002 it is probably good to
start to get these things in the pipeline.  It actually greatly simplifies
the code since we don't have to maintain all the workarounds or extract
text from <pre></pre> blocks.  Commits will be coming in, but only on the
main-trunk.  I have not tested the retrieval of CONTIG entries just yet
but can work on that later on.

NCBI has also has created a 'genome' db which I assume will let you be
stupid and try and suck down a whole genome (or perhaps just get GI
numbers) and 'sequence' combined databases of PopSet, Genome,
Protein, and Nucleotide dbs which means in theory we could use a single
object rather than the GenPept or GenBank objs to separate the request.
However, I am going to leave them separate for now.


Additionally there are PopulationSet data which would be cool to represent
in bioperl if someone is interested in wrapping that.  Also easy access to
the structure databases and taxonomy databases is available through this
tool and it would be a great addition for someone to write the simple
module which links to these databases and populates bioperl objects.

Martin has either already gotten it working and I've not checked - or it
would be trivial to connect a Bio::Biblio object ( a parser has already
been written) to the pubmed datasource avaialable through the e-utils as
well.

AFAIK there is no way to do batch entrez queries such as 'all mRNAs for
Nemotodes' or even 'all proteins for flys created after Jan 2002' through
the Eutils interface as there was in th previous batch entrez, but I must
admit to not digging around very much.

http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html

Just to say - if you managed to read this far - I would LOVE for someone
else to take an interest in this.  It's web CGI + Perl parsing and not a
lot of work with the great payoff that you have lots of people using your
code.

-jason
-- 
Jason Stajich
Duke University
jason at cgt.mc.duke.edu