[Bioperl-l] [BioPython] Cannot parse GenBank file

Chris Fields cjfields at uiuc.edu
Tue Jun 5 16:07:41 UTC 2007


One thing I missed which explains the biopython error: the LOCUS line  
is missing the locus identifier (see the NCBI example record link).   
This doesn't choke the bioperl parser but it appears to stop the  
biopython parser in it's tracks (maybe a feature instead of a bug!).

You should try adding a unique identifier (maybe the name of the file  
or record) to the LOCUS line to see if it works:

LOCUS  testfile           6499 bp ds-DNA     linear       02-AUG-2006

The bioperl parser in CVS writes out the correct alphabet when this  
is added:

LOCUS       testfile                6499 bp    ds-DNA  linear   02- 
AUG-2006

I'll try adding a warning to the bioperl parser for this.

chris

On Jun 5, 2007, at 10:28 AM, Chris Fields wrote:

> Martin,
>
> The example file you give in the bioperl bugzilla report has several
> blank annotation lines which may lead to additional problems.  When
> the BioPerl SeqIO parser finds annotation fields (SOURCE, ORGANISM,
> DEFINITION, etc) then it expects there will also be relevant data
> (text descriptions) accompanying it; I assume the BioPython parser
> expects likewise though I may be wrong.
>
> AFAIK the inclusion of field names w/o text isn't GenBank/EMBL-
> compliant.  GenBank records lacking text either have a '.' instead or
> are left out entirely:
>
> http://www.ncbi.nlm.nih.gov/Sitemap/samplerecord.html
>
> We could add a fix but you should probably contact the ApE developers
> and request that field names w/o text be left out or have '.' added.
>
> chris
>
> On Jun 5, 2007, at 9:04 AM, Martin MOKREJŠ wrote:
>
>> Ezequiel Panepucci wrote:
>>>>     genbank entry = parser.parse(fhandle)
>>>
>>> there is a space character between "genbank" and "entry".
>>> It is a syntax error.
>>> I suppose you meant "genbank_entry" ?
>>
>> Yes, the next command was right and has shown the error. Sorry, I
>> forgot
>> to delete the first attempt. ;-)
>>
>>>>> genbank_entry = parser.parse(fhandle)
>> Traceback (most recent call last):
>>  File "<stdin>", line 1, in ?
>>  File "/usr/lib/python2.4/site-packages/Bio/GenBank/__init__.py",
>> line 187, in parse
>>    self._scanner.feed(handle, self._consumer)
>>  File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py",
>> line 360, in feed
>>    self._feed_first_line(consumer, self.line)
>>  File "/usr/lib/python2.4/site-packages/Bio/GenBank/Scanner.py",
>> line 835, in _feed_first_line
>>    assert False, \
>> AssertionError: Did not recognise the LOCUS line layout:
>> LOCUS               6499 bp ds-DNA     linear       02-AUG-2006
>>
>>>>>
>>
>> Martin
>> _______________________________________________
>> BioPython mailing list  -  BioPython at lists.open-bio.org
>> http://lists.open-bio.org/mailman/listinfo/biopython
>
> Christopher Fields
> Postdoctoral Researcher
> Lab of Dr. Robert Switzer
> Dept of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
>
> _______________________________________________
> BioPython mailing list  -  BioPython at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/biopython

Christopher Fields
Postdoctoral Researcher
Lab of Dr. Robert Switzer
Dept of Biochemistry
University of Illinois Urbana-Champaign







More information about the Bioperl-l mailing list