[Bioperl-l] tables missing in mysql biosql instance

Hilmar Lapp hlapp at gmx.net
Thu Feb 20 09:03:22 EST 2003


On Thursday, February 20, 2003, at 06:09  AM, David Guzman wrote:

> Hi:
>
> I executed DROP on the swissprot - biosql db created yesterday. And
> today I have just repeated the process including the --safe flag, with
> the following command:
>
> [david at mandrake scripts]$ perl load_seqdatabase.pl --host localhost
> --dbname swbiosql --dbuser root --dbpass XXXXXX --driver mysql
> --namespace bioperl --safe --format swiss
> /opt/protdb/swissprot/sprot40.dat
>
> I checked the size of the folder containing the db (331M), is better
> than yesterday (24M), but it should be larger (399M) according to the
> HOWTO (my GBank with MySQL).

I would not go by comparing the size to the original flat file. There 
is nothing that provides for a direct correlation or even equality, 
except that it can't be 10x smaller of course. Try to count the number 
of bioentries and relate to the number of entries in the swissprot file.


> In the screen I obtained similar error
> messages, like:
>
> -------------------- WARNING ---------------------
> MSG: insert in Bio::DB::BioSQL::SpeciesAdaptor (driver) failed, values
> were ("IAP-IL3","a-particle:Mouse intracisternal:Intracisternal
> A-particles:Retroviridae:Retroid viruses:Viruses","11754","Mouse
> intracisternal a-particle","-") FKs ()
> Duplicate entry 'Mouse intracisternal a-particle--' for key 3
> ---------------------------------------------------
>

Yeah I know about these. The bioperl swissprot parser has a problem 
getting this esoteric 'species' right. Do you require these entries?

> -------------------- WARNING ---------------------
> MSG: Could not store P12894:
>

This is what you should watch out for. Capture the output in a log and 
then count:

	$ grep "MSG: Could not store" my.log | wc -l

The number you should see should be no bigger than 2 digits. The 
accession#s in those lines will not be in your BioSQL instance, while 
all others should be.

> and ...
>
> DBD::mysql::st execute failed: Duplicate entry '101583-178464' for key 
> 1
> at
> /usr/lib/perl5/site_perl/5.8.0/Bio/DB/BioSQL/BasePersistenceAdaptor.pm
> line 402, <GEN0> line 6360694.
>
> --lookup flag would help??? (for "Duplicate entry" complain?).
>

No it wouldn't. As I said, ignore these messages, as they are dealt 
with. You're going to see a number of them, unfortunately (as many 
entries e.g. reference the same dbxref twice). The only really 
important message is "MSG: Could not store XXXX".

With --safe the script only dies if an exception is raised outside of 
the bioperl-db adaptor code, e.g., if the parser dies. You should be 
able to see that by looking at the last stack trace or error message in 
your log.

> Then I am checking everything step by step, and I discovered that there
> are 2 tables missing: remote_seqfeature_name and ontology_relationship,
> how can I correct this problem with biosql-schema?.

With the latest before-singapore-change versions you shouldn't need the 
remote_seqfeature_name table. The ontology_relationship table is in 
sql/ontology/biosqldb-ontology-mysql.sql before Singapore, but the 
bioperl API doesn't use it so far. (The table will be in the main DDL 
after.)

Ewan, maybe it's not a bad idea to include instructions on how to 
interpret your load log into the INSTALL document.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the Bioperl-l mailing list