[Bioperl-l] Re: stale links, EMBL loading

Niels Larsen nel at birc.dk
Mon Jun 16 12:19:24 EDT 2003


> > The links
> >
> > http://bio.perl.org/SRC/branch-1-2/Bio/Tools/Run/WrapperBase.pm
> > http://bio.perl.org/bioperl-bugs
> > http://bioperl.org/Related.html
> >
> > and probably others, return error 404.
> 
> Where did you find these?

I dont remember now and indeed forgot to give you the referring page.
But if you enter http:://www.bioperl.org at http://validator.w3.org many 
broken links will be listed. 

> >  Loading each entry at
> > a time (using bioperl-db/scripts/biosq/load_seqdatabase.pl) however
> > takes 1-2 hours (didnt time exactly)
>
> Not sure what you mean here by each entry at a time. If you mean one 
> genbank entry (sequence) at a time, this certainly shouldn't take 1-2 
> hours, nor minutes, nor seconds. I used to get on the order of 3-10 
> entries per second for a database served by Oracle on a not-so-shiny 
> linux box. MySQL supposedly is faster ...

> What costs the time is mostly building up the 
> Bio::Seq+SeqFeature+Annotation object model and populating it for every 
> entry. If you don't want the object model to be built, I wouldn't use 
> bioperl. If you do want it to be built and populated, we'd be grateful 
> for suggestions how to build it faster ...

I meant .dat file (with 100,000) entries where I spoke "entry", sorry. Ok, I 
should then write a faster non-bioperl-OO parser that creates the tables in 
the biosql-schema. That way hopefully I can get my loading done and 
still use bioperl after that. I will tell you when I have it. 

Niels L

------------------------------------------------------------------------

Niels Larsen, Associate Professor
Bioinformatics Research Center (BIRC)
Aarhus University
Hoegh Guldbergsgade 10
DK 8000 Aarhus C
Denmark

Electronic mail: nel at birc.dk

Telephone: +45-8942-3153
Telefax: +45-8942-3077

------------------------------------------------------------------------



More information about the Bioperl-l mailing list