[BioPython] Cannot parse ApE plasmid editor GenBank file
Chris Fields
cjfields at uiuc.edu
Fri Jun 8 12:57:55 UTC 2007
On Jun 8, 2007, at 5:31 AM, Martin MOKREJŠ wrote:
...
>
> In principle I do agree with you but let me emphasize that I fully
> agree with Wayne
> who wrote me yesterday in the way that the GenBank format is he way
> to write down
> your data, and we often really do not need all the fields required
> for data syubmission
> into the Genbank database:
...
It does make sense to leave some of those fields out except in cases
where they are needed (with the exception of the '.' fields like
KEYWORDS), but it never made sense to me to have completely blank
fields or leave out the locus name. My guess is that most format
parsers don't look for empty fields (or complain when one is
encountered) b/c empty fields haven't been encountered before; they
were always left out completely. What would work best for all would
be optional validation warnings or a separate validation module if
one worried about checking compliance issues with GenBank format,
something that hasn't happened yet (and I don't have time to code for!).
Wayne, I would say use Martin's advice for the locus name (file name
w/o extension), and if the field allows '.' then add it in, otherwise
it's probably easier to leave the blank fields out completely,
GenBank compliance or not. There are several questionably compliant
files in the genbank test suite in BioPerl so this wouldn't be the
first one, and if someone wants a validation system they can try
building one until we have time to do it.
chris
More information about the Biopython
mailing list