[Biopython-dev] [Bug 2745] Bio.GenBank.LocationParserError with a GenBank CON file

Mon Feb 2 11:53:28 EST 2009

http://bugzilla.open-bio.org/show_bug.cgi?id=2745

------- Comment #4 from biopython-bugzilla at maubp.freeserve.co.uk  2009-02-02 11:53 EST -------
(In reply to comment #1)
> Created an attachment (id=1213)
 --> (http://bugzilla.open-bio.org/attachment.cgi?id=1213&action=view) [details]
> Example of a single GenBank CON record that fails

For interest, and as a possible work around, note that you can download this
GenBank file from Entrez WITH the sequence.  First of all, try this:

>>> from Bio import Entrez
>>> Entrez.email = "A.N.Other at example.com"
>>> data = Entrez.efetch("nucleotide",id="FA000001",rettype="genbank",retmode="text").read()
>>> out_handle = open("FA000001.gbk","w")
>>> out_handle.write(data)
>>> out_handle.close()

This gives the CONTIG line without the actual nucleotides (as in Bruce's
attachment, which I assume came from the NCBI's FTP site).

However, from reading the Entrez documentation, we can get the nucleotides too
by asking for "gbwithparts" instead of "gb" (or its equivalent, "genbank"). 
See
http://www.ncbi.nlm.nih.gov/entrez/query/static/efetchseq_help.html#SequenceDatabases

i.e.
>>> data = Entrez.efetch("nucleotide",id="FA000001",rettype="gbwithparts",retmode="text").read()
>>> out_handle = open("FA000001.gbwithparts.gbk","w")
>>> out_handle.write(data)
>>> out_handle.close()

I was getting some "Service unavailable!" or proxy errors earlier (which
Bio.Entrez wasn't catching - I've updated it in CVS), but this does work giving
a 12.8 MB file with the full sequence (with plenty of sections with an N).

-- 
Configure bugmail: http://bugzilla.open-bio.org/userprefs.cgi?tab=email
------- You are receiving this mail because: -------
You are the assignee for the bug, or are watching the assignee.