[Bioperl-l] Added 'Installing bioperl-db in Windows' to wiki, problems with bioperl-db

Thu Feb 16 01:54:01 UTC 2006

On Feb 14, 2006, at 12:32 PM, Chris Fields wrote:

> Hilmar,
>
> Good News: I've added a section to the bioperl wiki on installing  
> bioperl-db
> in Windows:
>
> http://www.bioperl.org/wiki/ 
> Installing_Bioperl_on_Windows#Installing_bioperl
> -db
>
> Bad News:  There's a new problem now. I updated from CVS yesterday; I  
> walked
> through the steps and ran 'nmake test', with everything passing fine.
> However, load_seqdatabase.pl is extremely slow; it's loading a sequence
> every 5 minutes or so.  I noticed (when using '-debug') that it is  
> hanging
> up in Bio::DB::BioSQL::SpeciesAdaptor each time.  If I create a  
> database,
> load the biosql schema, and load sequences w/o loading taxonomy, the  
> problem
> goes away.
>
> Here's the debugging output (I cut it off at the point it hangs up):
> [...]

> preparing UK select statement: SELECT taxon_name.taxon_id, NULL, NULL,
> taxon.ncbi_taxon_id, taxon_name.name, NULL FROM taxon, taxon_name WHERE
> taxon.taxon_id = taxon_name.taxon_id AND name_class = ? AND  
> ncbi_taxon_id =
> ?
> SpeciesAdaptor: binding UK column 1 to "scientific name" (name_class)
> SpeciesAdaptor: binding UK column 2 to "208964" (ncbi_taxid)

I'm a bit surprised if this is the query where it hangs. Are the  
indexes all there? There should be a primary key index on  
taxon.taxon_id, unique indexes on taxon.ncbi_taxon_id and on taxon_name  
over (taxon_id,name,name_class). Also, there should be separate indexes  
on taxon_name.taxon_id and taxon_name.name. Are they all there? If you  
reinstantiated the schema from the DDL then it seems unlikely that  
somehow the indexes have vanished except if you messed with the schema  
or the DDL.

Putting an index on taxon_name.name_class really can't make sense, so  
let's assume it can't be that.

So really I suspect this has something to do with the state of the  
database and the version of MySQL. In particular, from some 4.x version  
of MySQL under certain circumstances you have to analyze the statistics  
of the tables in order to get the optimizer pick up the indexes  
properly. Are you on MySQL 4.x and if so, have you done that?

There's the ANALYZE TABLE command:
http://dev.mysql.com/doc/refman/4.1/en/analyze-table.html

Note the comment: "This statement works with MyISAM, BDB, and (as of  
MySQL 4.0.13) InnoDB tables." Is your MySQL version 4.0.13 or higher?

Also, you can check the execution plan for the query using EXPLAIN.
http://dev.mysql.com/doc/refman/4.1/en/explain.html

This should show you whether the index would be picked up for the query  
or not. EXPLAIN as well as ANALYZE TABLE will need you to connect to  
the db using the mysql shell (mysql).

I believe something similarly strange was encountered by someone using  
DB::GFF (or Chado) under MySQL, and if I recall correctly the solution  
was to optimize (analyze) the tables. Maybe someone who was in that  
thread reads this and can comment?

	-hilmar

>
> ----------------------------------------------------------------------- 
> -----
> -------------------------
>
> Christopher Fields
> Postdoctoral Researcher - Switzer Lab
> Dept. of Biochemistry
> University of Illinois Urbana-Champaign
>
>
>
> _______________________________________________
> Bioperl-l mailing list
> Bioperl-l at lists.open-bio.org
> http://lists.open-bio.org/mailman/listinfo/bioperl-l
>
>
-- 
----------------------------------------------------------
: Hilmar Lapp -:- San Diego, CA -:- hlapp at gmx dot net :
----------------------------------------------------------