[Biopython-dev] Problems importing GenBank Files with complex LOCATION tags
Nick Loman
n.j.loman at bham.ac.uk
Mon Feb 2 11:54:50 UTC 2009
Hi there,
I'm attempting to import the whole of RefSeq into a BioSQL schema using
the BioPython loader. However, I am encountering problems with items in
the CON division, such as NW_002063152. I am using stock Biopython 1.49
install.
The problem occurs when parsing complex CONTIG location tags, such as
the following (spacing adjusted for readability):
CONTIG
join(NZ_ABJI01000250.1:1..6235,gap(unk100),
NZ_ABJI01000251.1:1..2827,gap(1420),NZ_ABJI01000252.1:1..1802,
gap(unk100),NZ_ABJI01000253.1:1..2460,gap(unk100),
NZ_ABJI01000254.1:1..12092,gap(639),NZ_ABJI01000255.1:1..1192,
gap(unk100),NZ_ABJI01000256.1:1..5498,gap(unk100),
NZ_ABJI01000257.1:1..20442,gap(unk100),NZ_ABJI01000258.1:1..2364,
gap(511),NZ_ABJI01000259.1:1..17405,gap(unk100),
NZ_ABJI01000260.1:1..2462,gap(570),NZ_ABJI01000261.1:1..3348,
gap(410),NZ_ABJI01000262.1:1..815,gap(196),
NZ_ABJI01000263.1:1..589)
I have worked around the problem by rewriting during my import to
produce a blank ORIGIN definition, which at least gets the sequence
features imported.
I realise complex location parsing has been discussed before on this
list - would the authors expect this to parse correctly, or is it out of
the scope of the current code?
Best regards,
Nick.
More information about the Biopython-dev
mailing list