[Bioperl-l] Re: stale links, EMBL loading

Hilmar Lapp hlapp at gmx.net
Sun Jun 15 22:18:35 EDT 2003


On Sunday, June 15, 2003, at 12:25  AM, Niels Larsen wrote:

> The links
>
> http://bio.perl.org/SRC/branch-1-2/Bio/Tools/Run/WrapperBase.pm
> http://bio.perl.org/bioperl-bugs
> http://bioperl.org/Related.html
>
> and probably others, return error 404.

Where did you find these?

>
> Then, I am looking into bioperl, hope to be able to use it and if so,
> contribute. While trying SeqIO, I got the error below; this error comes
> only when I use Bio::SeqIO from a script where I also invoke my own
> error-catching module which traps this,
>
> use sigtrap qw ( die normal-signals stack-trace any error-signals );
>

I responded in a separate email that this is from bioperl-db because 
you can't intercept die's (bioperl-db needs them to react upon).

> The error below I created by including my error-module in the script
>
> bioperl-db/scripts/biosql/load_seqdatabase.pl
>
> Btw, to load a new EMBL/GenBank/DDBJ release in hours instead of
> days, should I write something that creates temporary files (say one
> per .dat file) and loads those in one go, instead of one entry at a 
> time ..
> or does some other solution exist (I couldnt find it)?
>

Look at the CPU-load distribution between the perl process and the RDMS 
process (mysql? pg? Oracle?). What I get with richly annotated formats 
like genbank is about 0.7 perl and 0.15-0.3 for the RDBMS process. If 
this is sort of the balance you see then dividing into chunks will 
help. Don't do one entry per file though, rather chunk in larger units 
and then load in parallel processes. E.g. chunk by genbank section 
(primates, rodents, etc, you get the idea).

If you load the CPUs on the db server with one loader process already, 
then firing up a second one will only degrade performance. This also 
means, don't run too many loaders in parallel or otherwise you will 
suffer from contention for disk IO and transactional locks.

	-hilmar
-- 
-------------------------------------------------------------
Hilmar Lapp                            email: lapp at gnf.org
GNF, San Diego, Ca. 92121              phone: +1-858-812-1757
-------------------------------------------------------------



More information about the Bioperl-l mailing list