[Biojava-l] Different implementation of Sequence?
Y D Sun
Yudong.Sun at newcastle.ac.uk
Fri Jun 6 15:56:51 EDT 2003
Hi Thomas,
I download the Embl files from
http://www.ebi.ac.uk/cgi-bin/genomes/genomes.cgi?genomes=Bacteria
In the list, for example, No. 11 (BA000040) and 88 (BA000030) are large
files.
I find a problem in my PostgreSQL installation. I didn't build JDBC
driver in it. Is this the cause of poor performance? I will reinstall
it. However, BioSQL code does work without this build, slowly of course.
George
> -----Original Message-----
> From: Thomas Down [mailto:td2 at sanger.ac.uk]
> Sent: 06 June 2003 14:33
> To: Y D Sun
> Cc: biojava-l at biojava.org
> Subject: Re: [Biojava-l] Different implementation of Sequence?
>
>
> On Fri, Jun 06, 2003 at 10:33:28AM +0100, Y D Sun wrote:
> >
> > > -----Original Message-----
> > > From: Thomas Down [mailto:thomas at derkholm.net]
> > > Sent: 05 June 2003 22:58
> > > To: smh1008 at cus.cam.ac.uk
> > > Cc: Y D Sun; Thomas Down; biojava-l at biojava.org
> > > Subject: Re: [Biojava-l] Different implementation of Sequence?
> > >
> > >
> > > Once upon a time, David Huen wrote:
> > > > On Thursday 05 Jun 2003 6:07 pm, Y D Sun wrote:
> > > > > Having created the indices as following and restarted
> > > > > postmaster,
> > > > > the performance of feature filtering is even worse. Maybe
> > > MySQL is a
> > > > > better choice than PostgreSQL. Does anyone have the similar
> > > > > experience?
> > > > >
> > > > Was the access code exactly as you depicted it? ie. only
> > > filtering on
> > > > "CDS".
> > > > Also what was the dataset you searched? was it the same
> > > dataset in both
> > > > EMBL flat file and BioSQL? What is your version of
> > > postgresql and what was
> > > > the platform?
> > > >
> >
> > Yes, that is the exact code I use to filter CDS on one
> sequence. The
> > same code is used for the same sequence loaded from Embl file and
> > PostgreSQL database. The execution times (for one sequence only) in
> > two cases are highly diverse .
> >
> > I installed PostgreSQL 7.3.2 on Linux 2.4.20.
> >
> > The database contains 10 complete bacterial sequences with
> length from
> > 2M to 9M (Embl file size 9M to 18M).
>
> Hmmm...
>
> That amount of data definitely ought to be handled without
> any big problems. On the other hand, it's enough that if
> a lot of the database accesses we're going for indices, it
> could plausibly take a minute or so.
>
> I've attached a new schema file which is compatible with the
> one on the website, but has some extra CREATE INDEX
> statements. Could you try again with that. If that doesn't
> help, it might be worth trying MySQL to get a different datapoint.
>
> Finally, could you point me to one of the EMBL files you use
> (preferably the biggest one), and I'll do some testing at some point.
>
> Thomas.
>
More information about the Biojava-l
mailing list