[Biopython-dev] Propose: Adding an alias name (gb) for Genbank in SeqIO

Peter biopython at maubp.freeserve.co.uk
Wed Apr 15 09:40:56 UTC 2009


On Wed, Apr 15, 2009 at 3:05 AM, Sebastian Bassi
<sbassi at clubdelarazon.org> wrote:
> As a follow up to bug 2811 where "gb" is now a valid name in
> Bio.Entrez, ...

Just to note that in Entrez EFetch, using rettype=gb (and the related
rettype=gb for proteins in GenPept format) has always been a valid
argument (and in fact has always been the documented way to get a
GenBank/GenPept file back).

>From my point of view it was a nice feature of Entrez EFetch that they
used to (unofficially) support retype=genbank, which was consistent with
Bio.SeqIO.  I suppose you could all try lobbing the NCBI to put Entrez
EFetch back to the pre Easter 2009 behavior, but realistically we'll just
have to live with it.

Now that Entrez EFetch doesn't support the unofficial rettype=genbank
argument anymore, we have the current situation where you must use
"gb" (or "gp") for Bio.Entrez but "genbank" for Bio.SeqIO.  I agree this
isn't so nice, but as I wrote on Bug 2811, I'm not keen on having aliases
in Bio.SeqIO (but I may be in a minority here, hence suggesting a
discussion).  On the plus side, EMBOSS offers "gb" (and "ddbj") as
alternative aliases for "genbank", so there is precedent.

In a related approach, I suppose we could have Bio.SeqIO take
"genbank" to mean GenBank or GenPept as determined from the file
or the alphabet (as now), and add "gb" meaning (nucelotide) GenBank
files, and "gb" meaning (protein) GenPept files.

But again, this breaks the Python ideal of there being one clear way to
do things (having multiple names for the same format).

Peter



More information about the Biopython-dev mailing list