[Biopython-dev] EFetch returning ASN.1 not genbank format

Peter biopython at maubp.freeserve.co.uk
Mon Apr 13 17:55:53 UTC 2009


Hi all,

At then end of last week I found test_SeqIO_online.py was failing and
traced this to a change in Entrez EFetch.  EFetch is documented here:
http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html

The issue is with EFetch and the undocumented rettype=genbank argument
which we currently use in our documentation and unit tests.  This
isn't an "official" argument in that it isn't listed on their website,
but until recently it returned plain text GenBank files, acting like
the official rettype=gb or gp arguments.  However, as of the end of
last week, EFtech returns the default format instead (ASN.1), causing
test_SeqIO_online.py to fail and rendering some of our examples
misleading.

I emailed the NCBI and received a very prompt reply,

> Dear Colleague,
>
> As the e-Utils continue to be refined our developers sometimes
> address one-off issues, and this was one of them. The 'official'
> parameter for GenBank is rettype=gb. Now if the parameter is not
> correct you will default to ASN.1 in the nucleotide databases. We
> apologize for any inconvenience.
>
> Regards,
>
> Steve Pechous, Ph.D.
> NCBI User Services

I then emailed back (before Easter) to ask if they would reconsider
this change, and have just had a reply:

> Hi Peter,
>
>	This will likely not reverse back as the true parameters are laid out
> in the help documents and are now required, so to speak.
>
> Regards,
>
> Steve Pechous, Ph.D.
> NCBI User Services

With hindsight we shouldn't have used rettype="genbank", but it did
seem to make things simpler for our documentation and I really hadn't
expected the NCBI to change this.

I think we have two options:

(1) Add a special case to Bio.Entrez.eftech to map rettype="genbank"
to rettype="gb" (or "gp" for the protein database).  This is simple
and causes least disruption to Biopython uses, but is a bad idea in
the long run as it means we are effectively providing our own variant
of the Entrez API.

(2) Update our documentation and unit tests to use rettype="gb" or
"gp" instead of rettype="genbank", and add a special case to
Bio.Entrez.eftech to map rettype="genbank" to rettype="gb" (or "gp"
for the protein database) and issue a warning that the NCBI have
changed their API.  At a later point we might change this warning to
an error.  This would provide a clear transition for end user scripts,
and keep us consistent with the official Entrez API.

I favour option (2) here.  Any other thoughts?  Whatever we do should
happen before we release Biopython 1.50.

Peter




More information about the Biopython-dev mailing list