[BioSQL-l] load_seqdatabase.pl warnings and errors

Peter biopython at maubp.freeserve.co.uk
Wed May 20 11:34:51 UTC 2009


On Wed, May 20, 2009 at 12:25 PM, michael watson (IAH-C)
<michael.watson at bbsrc.ac.uk> wrote:
>
> We have a winner :)
>
> NC_003992, NC_011452, NC_011451, NC_011450 all share
> at least one reference.
>
> Would changing --flatlookup to --lookup change the behaviour
> so it checks for an existing reference before trying to insert the
> duplicate?
>
> The answer is no :( (see below).
>
> I guess this may need some coding then!

My crude idea for a simple ad-hoc solution would be to remove these
pointless references from the records, before loading them into
BioSQL.

One way would be to edit the four GenBank files by hand (e.g. to
remove the reference or make them unique). You might also do this in a
BioPerl script that loads the records, edits the references, and then
puts them in the database. Personally I use Python not Perl, so I
can't tell you how you might do that with BioPerl.

Hilmar may be able to comment from a BioPerl/BioSQL point of view -
clearly CRC collisions of this nature will happen again in future.

Peter



More information about the BioSQL-l mailing list