[BioPython] Cannot parse ApE plasmid editor GenBank file

Peter biopython at maubp.freeserve.co.uk
Tue Jun 5 21:11:36 UTC 2007


Chris Fields wrote:
> Note that the presence of the locus name appears to be required  
> according to the GenBank release notes.  There is no optional  
> designation for the LOCUS line (it is mandatory as stated in sec.  
> 3.4.2), and the locus name appears in the line for all records (sec.  
> 3.5.4).  

I agree that valid GenBank files should indeed have a locus name in the 
LOCUS line. If it doesn't cause too many issues, then maybe we should 
allow such files as input.

Having just gone over the Biopython code, if the locus name is missing 
but there is nothing else wrong with the LOCUS line, Biopython will give 
a slightly cryptic AssertionError, "Cannot parse the name and length in 
the LOCUS line"

I could make the parser cope with missing locus names, but on 
reflection, that may just cause worse problems further downstream (e.g. 
trying to index the file). One option is to auto-generate an identifier...

Lets wait and see what Wayne's new version of ApE plasmid editor outputs 
for "GenBank format" - maybe he will include some sort of locus name.

Peter




More information about the Biopython mailing list