[BioPython] problem with GenBank.NCBIDictionary ?

Cath Lawrence Cath.Lawrence@anu.edu.au
Thu, 16 May 2002 10:12:14 +1000


On Thursday, May 16, 2002, at 03:18 AM, Jeffrey Chang wrote:
> It looks like GenBank is expecting to get genbank ID's rather than
> accession numbers.  Using your code, if I look for the ID:
>     gb_rec = gb_dict['1617401']
> I get the gene back with accession X98475.

Using GI numbers as the key worked for me.
However, I'm not sure this is the problem for Catherine.

> On Wed, May 15, 2002 at 11:02:24AM +0200, Catherine Letondal wrote:
>> Traceback (most recent call last):
>>   File "<stdin>", line 1, in ?
>>   File "/local/lib/python2.2/site-packages/Bio/GenBank/__init__.py", 
>> line 1552, in __getitem__
>>     raise KeyError, "I unexpectedly got back html-formatted data."
>> KeyError: I unexpectedly got back html-formatted data.
>> Is my code incorrect?

I actually got several records with partial HTML markup from NCBI, both 
when downloaded through biopython and direct from the website. So it 
looks like it is entrez at fault rather than biopython. They were all 
anchor tags, mostly in the feature areas. I ended up removing them 
manually with sed.

With BioPython, the GenBank.download_many command worked for me, while 
looping through a list of GIs broke when an
anomalous record turned up.

Here's my little test script:
-------------------------------------
from Bio import GenBank

# This works
def print_rec(record):
     print record
gi_list = GenBank.search_for("Mammalia, complete, genome, mitochondrion")
GenBank.download_many(gi_list, print_rec)

# getting an unaccountable bug here...
# somewhere in the middle it starts returning HTML instead of plain
# text records.
#ncbi_dict = GenBank.NCBIDictionary()
#for i in range(len(gi_list)):
#    print ncbi_dict[gi_list[i]]
-------------------------------------------

hope this helps,
cheers
Cath
Cath Lawrence,                       Cath.Lawrence@anu.edu.au
Scientific Programmer,  Centre for Bioinformation Science,
John Curtin School of Medical Research
Australian National University,  Canberra ACT 0200
ph: Maths (02) 6125 2904;  JCSMR (02) 6125 0417;
mobile: 0421-902694   fax: (02) 61254712