[BioPython] GenBank parser

Brad Chapman chapmanb at uga.edu
Wed May 5 13:57:55 EDT 2004


Hi Leighton;

> I've noticed an oddity in the GenBank FeatureParser (CVS installation
> 19/4).  While parsing the Salmonella typhi file NC_003198.gbk, my way of
> dealing with 'gene' tags fell over.  This turned out to be because the
> GenBank file contains entries with valueless tags such as /partial and
> /pseudo.  The current parser concatenates these tags with the following
> tag.

Ah, good catch. Yes, I was dealing with these incorrectly in the
parsing. The problem, briefly, was that Martel generates two
feature_qualifier_name XML tags in a row (pseudo and gene) without
an intervening XML tag. In these cases, the parsing framework
assumes that the two tags are the same set of information, but split
up over multiple tags. Since XML can split long reems of information
over multiple tags, this is a safe assumption in most places but
falls apart here.

A fix for this was just checked in CVS, in which the
feature_qualifier_name tags are handled correctly.

Thanks for the bug report!
Brad


More information about the BioPython mailing list