[Biopython-dev] Location Parser

Peter Cock p.j.a.cock at googlemail.com
Fri Dec 21 13:09:47 UTC 2012


On Tue, Dec 18, 2012 at 12:40 PM, Matthias Bernt <MatatTHC at gmx.de> wrote:
> Dear list,
>
> I have some problems with the GenBank parser in version 1.60. Its again
> nested location strings like:
>
> order(6867..6872,6882..6890,6906..6911,6918..6923,6930..6932,7002..7004,7047..7049,7056..7061,7068..7073,7077..7082,7086..7091,7098..7100,7119..7136,7146..7151,7158..7163,7170..7172,7179..7184,7212..7214,join(7224..7229,8194..8208),8218..8223,8245..8247,8401..8403)
> as found in NC_003048.

Do you have a URL for that? This looks OK to me:
http://www.ncbi.nlm.nih.gov/nuccore/NC_003048.1

Perhaps the entry came from the FTP site?
e.g. one of these files?: ftp://ftp.ncbi.nih.gov/refseq/release/fungi/

> What happens is that the parser stalls. It seems as if it takes forever to
> parse _re_complex_compound in and never gets to the if statement that
> checks if order and join appears in the location string.
>
> I suggest to move the if statement before the regular expressions are
> tested.
>
> I remember that I posted something like this before. But I can not remember
> how and if this was solved.
>
> Regards,
> Matthaas

Were similar odd locations have come up in some cases they did
seem to be NCBI bugs - could you raise a query with the NCBI
for this case please?

If this is valid (which I doubt), then our object model doesn't cope.

If this is invalid, then Biopython should give a warning and skip
this location. Right now I can't find the file to test this (see
query above about where it came from).

Regards,

Peter




More information about the Biopython-dev mailing list