[Biopython-dev] Problems importing GenBank Files with complex LOCATION tags

Nick Loman n.j.loman at bham.ac.uk
Mon Feb 2 06:54:50 EST 2009


Hi there,

I'm attempting to import the whole of RefSeq into a BioSQL schema using 
the BioPython loader. However, I am encountering problems with items in 
the CON division, such as NW_002063152. I am using stock Biopython 1.49 
install.

The problem occurs when parsing complex CONTIG location tags, such as 
the following (spacing adjusted for readability):

CONTIG
    join(NZ_ABJI01000250.1:1..6235,gap(unk100),
    NZ_ABJI01000251.1:1..2827,gap(1420),NZ_ABJI01000252.1:1..1802,
    gap(unk100),NZ_ABJI01000253.1:1..2460,gap(unk100),
    NZ_ABJI01000254.1:1..12092,gap(639),NZ_ABJI01000255.1:1..1192,
    gap(unk100),NZ_ABJI01000256.1:1..5498,gap(unk100),
    NZ_ABJI01000257.1:1..20442,gap(unk100),NZ_ABJI01000258.1:1..2364,
    gap(511),NZ_ABJI01000259.1:1..17405,gap(unk100),
    NZ_ABJI01000260.1:1..2462,gap(570),NZ_ABJI01000261.1:1..3348,
    gap(410),NZ_ABJI01000262.1:1..815,gap(196),
    NZ_ABJI01000263.1:1..589)

I have worked around the problem by rewriting during my import to 
produce a blank ORIGIN definition, which at least gets the sequence 
features imported.

I realise complex location parsing has been discussed before on this 
list - would the authors expect this to parse correctly, or is it out of 
the scope of the current code?

Best regards,

Nick.







More information about the Biopython-dev mailing list