[Biopython-dev] Bio.GenBank.LocationParser chokes on misc_feature in Desulfurococcus kamchatkensis 1221n/NC_011766.gbk

Peter Cock p.j.a.cock at googlemail.com
Mon Jul 11 09:38:03 UTC 2011


On Mon, Jul 11, 2011 at 9:34 AM, Tim te Beek <tim.te.beek at nbic.nl> wrote:
> When parsing ftp://ftp.ncbi.nih.gov/genomes/Bacteria/Desulfurococcus_kamchatkensis_1221n_uid59133/NC_011766.gbk
> using SeqIO.read(genbank_file, 'genbank') I get the following
> stacktrace:
>
> ...
>     gbk_records = (SeqIO.read(genbank_file, 'genbank') for
> genbank_file in genbank_files)
> ...
> Bio.GenBank.LocationParserError:
> order(1078481..1078483,join(1078778,1078800..1078810))
>
> The offending feature is:
> misc_feature    complement(order(1078481..1078483,join(1078778,
>                 1078800..1078810)))
>                 /locus_tag="DKAM_1147"
>                 /note="active site"
>                 /db_xref="CDD:73252"
>
> Could you look into whether this is a bug in the parser or in the input file?
>

That looks like the issue reported in Bug 3197, which turned out to be invalid
GenBank files: https://redmine.open-bio.org/issues/3197

Quoting from: http://www.ncbi.nlm.nih.gov/collab/FT/
>>
>> 3.4.2.2 Operators
>>
>> ...
>>
>> Note : location operator "complement" can be used in combination with
>> either "join" or "order" within the same location; combinations of "join"
>> and "order"  within the same location (nested operators) are illegal.

Please report this problem with NC_011766.gbk and NC_009142.gbk to
the NCBI (could you CC me too?), try using gb-admin at ncbi.nlm.nih.gov

The next release of Biopython will have a clearer error message in this
situation.

Thank you,

Peter




More information about the Biopython-dev mailing list