[Bioperl-l] Indexing large databases / BioSQL

Erik er at xs4all.nl
Mon Apr 7 14:36:57 UTC 2008


On Mon, April 7, 2008 14:34, Sendu Bala wrote:
> Bánk Beszteri wrote:
>> Hi Hilmar,
>>
>> it was important to understand that the inconsistency in taxon names is
>> apparently only between the Swissprot entries with "non-standard" names
>> and the contents of the taxonomy tables and that it is best to use a
>> pre-loaded taxonomy, thanks for that! We have now updated to
>> bioperl-live (and bp-db-live, too) and load_seqdatabase.pl seems to have
>> loaded everything OK in ~26 hours (with many of the "The supplied
>> lineage does not start near..." warnings, but no other problems).
>
> Can you provide some examples of these warnings (of the taxons that
> cause them)? If there's anything consistent about them perhaps
> Bio::Species can be improved to accommodate them properly (instead of
> just issuing the warning and getting the classification wrong).
>

I did this a little while ago and saved the output
(UniProtKB/Swiss-Prot Release 55.1 of 18-Mar-2008, I think).

All warnings (and a few errors) for swissprot are here:

   http://bugzilla.open-bio.org/show_bug.cgi?id=2474

as an attached file

I suppose the OP will have encountered similar output - I don't think there is
much RDBMS-type-dependency involved.

   regards,

   Erik Rijkers






More information about the Bioperl-l mailing list