[Bioperl-l] Writing genbank files

David Waring dwaring@u.washington.edu
Fri, 18 Jan 2002 14:54:05 -0800


I have come across a problem with genbank files using the perl module
Bio::DB::GenBank. When I get the genbank sequence from NCBI and write the
sequence out to in genbank format the Locus line is missing the date.

LOCUS       AC104722    24949 bp    DNA             linear       BCT

instead of

LOCUS       AC104722    24949 bp    DNA             linear       BCT
21-DEC-2001

which is what I get when I download the file myself. I don't know if this
represents a problem in reading the reading the file or writing the file.

Why am I cross-posting this to biojava???. Well the biojava parser dies on
such a file with a message that says that the Locus line is too short.

Is the date a required element in the Locus line? Is there consensus on what
constitutes correct format? Has it changed recently?

David



I also noticed that the biojava parser is very picky about the number of
spaces; delete a few spaces between DNA and linear and it dies too.

	Exception in thread "main" org.biojava.bio.seq.io.ParseException: LOCUS
line too
	 short [LOCUS       AC104719    17453 bp    DNA           linear       BCT
21-DE
	C-2001]
  	      at
org.biojava.bio.seq.io.GenbankContext.parseLocusLinePost127(GenbankFo
	rmat.java, Compiled Code)
      	  at
org.biojava.bio.seq.io.GenbankContext.processHeaderLine(GenbankFormat
	.java, Compiled Code)
 	       at
org.biojava.bio.seq.io.GenbankContext.processLine(GenbankFormat.java,
	 Compiled Code)
	        at
org.biojava.bio.seq.io.GenbankFormat.readSequence(GenbankFormat.java,
	 Compiled Code)
	rethrown as org.biojava.bio.BioException: Could not read sequence
	        at
org.biojava.bio.seq.io.StreamReader.nextSequence(StreamReader.java, C
	ompiled Code)