[Biojava-dev] GenbankFormat (biojavax) and comments with leading whitespace
Bubba Puryear
bubba.puryear at gmail.com
Fri Sep 29 16:22:03 UTC 2006
Hey all,
I've been using biojava for some time now on my project for reading
genbank flat files, but until reacently I haven't been writing any.
Our client makes extensive use of VectorNTI (version 9, I think) and I
was doing some edits to genbank files (via biojavax) and notice that
comment values get their whitespace trimmed.
Turns out VNTI splats a load of state that it needs in the comment
section is a fairly lispish looking syntax... but indentation appears
to be important. In particular, VNTI won't read the files I've edited
that have had their whitespace munged. I have some local changes to
the parser that preserve leading/trailing whitespace for section
values for top level sections.
I've run the tests locally (and added one for testing indented
comments) and run this against ~ 3000 files I have locally. I wanted
to get some feedback on this before I committed, though.
As an example of the kind of thing that currently gets munged:
COMMENT Vector_NTI_Display_Data_(Do_Not_Edit!)
COMMENT (SXF
COMMENT (CGexDoc "11460" 0 6359
COMMENT (CDBMol 0 0 1 1 1 0 0 1633772385 0 "" "" 0 0 0 0
(CObList) (CObList)
COMMENT (CObList) (CObList) -1)
COMMENT (CDocSetData 1 0 0 0 0 0 "MAIN" 1 1 1 1 0 0 1 1 0 1 10 5
40 50 0 1 0
....
The level of indentation can get quite deep.
Thanks,
Bubba
More information about the biojava-dev
mailing list