[Biopython-dev] GenBank bug, oriT feature missing
Brad Chapman
chapmanb at uga.edu
Sun Feb 29 17:17:58 EST 2004
Hey guys;
[Mark reports yet another new feature tag added to GenBank files]
> Martel.Parser.ParserPositionException: error parsing at or beyond
> character 1981
>
> After digging into the GenBank code (__init.py__) and then into Martel's
> code. I found I could turn on debugging:
>
> GenBank.FeatureParser(debug_level=2)
>
> I finally see where things die (and what character 1981 means).
>
> for AE000070 there is a feature tag "oriT", which seems to be missing
> from genbank_record.py and __init__.py
[And makes a useful suggestion that others second (and third...)]
> This really isn't a pretty way of dealing with unknown features. Is
> there a way to get this to just pass unknown features?
Yes, I completely agree that this is a pain. The problem is an
unfortunate design decision where the format used to parse the files
uses a hard-coded list of tags. This made sense when it was
originally designed since there are supposed to be a restricted set
of feature and qualifier key names that can be used. Unfortunately,
it's turned into a headache for everyone since NCBI keeps adding
tags.
I've decided to get rid of this and just checked in a series of
changes to CVS that update the genbank format so it shouldn't run
into this problem any longer -- the new format uses a general
regular expression (basically \w, plus some additional characters
that get used like ' and - ), so it shouldn't run into this problem.
In the process of making these changes I've also done a general
cleanup of the format file and merged it with the old (but still
with plenty of useful bits of code) format in
Bio.expressions.genbank. I've moved Bio/GenBank/genbank_format.py to
Bio/expressions/genbank.py -- so for those of you who look at it or
change it (thanks Peter!), you now need to look there.
So, long story short -- I hope I fixed this problem for the future.
Please do give the new version in CVS a go and let me know if it has
any problems on your files. Sorry about the pain and thanks for
the report!
Brad
More information about the Biopython-dev
mailing list