[BioPython] Cannot parse ApE plasmid editor GenBank file

Martin MOKREJŠ mmokrejs at ribosome.natur.cuni.cz
Thu Jun 7 14:51:14 UTC 2007


Peter wrote:
> Chris Fields wrote:
>> Note that the presence of the locus name appears to be required  
>> according to the GenBank release notes.  There is no optional  
>> designation for the LOCUS line (it is mandatory as stated in sec.  
>> 3.4.2), and the locus name appears in the line for all records (sec.  
>> 3.5.4).  
> 
> I agree that valid GenBank files should indeed have a locus name in the 
> LOCUS line. If it doesn't cause too many issues, then maybe we should 
> allow such files as input.
> 
> Having just gone over the Biopython code, if the locus name is missing 
> but there is nothing else wrong with the LOCUS line, Biopython will give 
> a slightly cryptic AssertionError, "Cannot parse the name and length in 
> the LOCUS line"
> 
> I could make the parser cope with missing locus names, but on 
> reflection, that may just cause worse problems further downstream (e.g. 
> trying to index the file). One option is to auto-generate an identifier...

I would vote for that. A number of things will break when the LOCUS is same
for multiple records. But, imagine, I just have multiple file with same
LOCUS identifier (a plasmid name) and it simply does happen that multiple
plasmids of different sequence have same abbreviated names. I need to stick
to their original names as published by authors in Literature, so I really
do have several files with same LOCUS identifier in the LOCUS line. So, the
internal indexing stuff must kick in.

> 
> Lets wait and see what Wayne's new version of ApE plasmid editor outputs 
> for "GenBank format" - maybe he will include some sort of locus name.

It is being fixed now, still some polishing needed. But it will produce
Genbank formatted files according to current standard.

Martin
BTW I ahve proposed ApE editor derives the LOCUS identifier from a filename
by stripping the file extension.



More information about the Biopython mailing list