[BioPython] Regarding GenBank

Brad Chapman chapmanb at 50mail.com
Thu Aug 12 17:02:45 EDT 2004


Hi Sameet;

> I am getting strange problems with the indexing of the GenBank file.  Is 
> there any upper limit on the size of GenBank file that can be indexed.  I am 
> sending the commands i typed on the IDLE and the trace that i got back.  I 
> know that the Genbank files are fine because i downloaded it directly from 
> the net
> 
> >>> from Bio import GenBank
> >>> dict_file = r'C:\Sameet\correspondence\genbank.gb'
> >>> index_file = r'C:\Sameet\correspondence\genbank.idx'
> >>> GenBank.index_file(dict_file, index_file)
> 
> This is the trace that i get.  Am i doing something wrong
> 
> Traceback (most recent call last):
[...]
> ParserPositionException: error parsing at or beyond character 1843

This error just indicates that the parser is unable to parse the
GenBank file. To find out the problem we're really going to need to
see the genbank.gb file that you are trying to index. The parsing
error indicates that it has a problem very early in the file -- can
you reproduce the error with just the first record of the file? Or
at the very least get it down to one or two records that show the
problem? The key is to provide a minimal GenBank file that
demonstrates the problem.

Give this, if you could either post a bug with the file as an
attachment (http://bugzilla.open-bio.org/), or send the file as an
attachment to the biopython-dev list, we can take a look at see why
it is failing. GenBank parsing is tricky since new records can come
out all the time which break the parser, so we do need to have a
look. Thanks!

Brad


More information about the BioPython mailing list