[BioPython] Cannot parse ApE plasmid editor GenBank file

Chris Fields cjfields at uiuc.edu
Tue Jun 5 21:46:07 UTC 2007


On Jun 5, 2007, at 4:11 PM, Peter wrote:

> Chris Fields wrote:
>> Note that the presence of the locus name appears to be required   
>> according to the GenBank release notes.  There is no optional   
>> designation for the LOCUS line (it is mandatory as stated in sec.   
>> 3.4.2), and the locus name appears in the line for all records  
>> (sec.  3.5.4).
>
> I agree that valid GenBank files should indeed have a locus name in  
> the LOCUS line. If it doesn't cause too many issues, then maybe we  
> should allow such files as input.
>
> Having just gone over the Biopython code, if the locus name is  
> missing but there is nothing else wrong with the LOCUS line,  
> Biopython will give a slightly cryptic AssertionError, "Cannot  
> parse the name and length in the LOCUS line"
>
> I could make the parser cope with missing locus names, but on  
> reflection, that may just cause worse problems further downstream  
> (e.g. trying to index the file). One option is to auto-generate an  
> identifier...
>
> Lets wait and see what Wayne's new version of ApE plasmid editor  
> outputs for "GenBank format" - maybe he will include some sort of  
> locus name.
>
> Peter

In BioPerl you can optionally pass in a custom generator  
(specifically a code reference) to generate the LOCUS, ACCESSION,  
VERSION, and KEYWORD lines if needed.  You might be able to do  
something similar for your parser, though I'm not yet familiar with  
Python enough to work out how...

chris



More information about the Biopython mailing list